# Introduction

A random variable $$X \sim \operatorname*{\mathbb P}$$ is sub-Gaussian with variance factor $$\nu$$ if $\ln \operatorname*{\mathbb E}{\left[e^{\eta(X-\operatorname*{\mathbb E}[X])}\right]} ~\le~ \frac{\eta^2 \nu}{2} \qquad \text{for all \eta \in \mathbb R} .$ The reason for the name is that Gaussians satisfy this with equality. In this post we consider Bernoulli $$X \sim \mathcal B(\theta)$$ (so $$X=1$$ with probability $$\theta$$ and $$X=0$$ with probability $$1-\theta$$). Then $$\operatorname*{\mathbb E}[X] = \theta$$. We find \begin{align} \notag \Psi_\theta(\eta) &~:=~ \ln \operatorname*{\mathbb E}{\left[e^{\eta(X-\theta)}\right]} \\ \notag &~=~ \ln {\left( \theta e^{\eta(1-\theta)} + (1-\theta) e^{\eta(0-\theta)} \right)} \\ \label{eq:show.ccv} &~=~ - \eta \theta + \ln {\left( 1 + \theta (e^\eta-1) \right)} \end{align} To get a bound that does not depend on $$\theta$$, let us look at $\Psi(\eta) ~:=~ \max_{\theta \in [0,1]} \Psi_\theta(\eta) .$ The expression for $$\Psi_\theta(\eta)$$ implies that it is concave in $$\theta$$. So it is maximised at zero derivative, i.e. $$\theta = \frac{1}{\eta } - \frac{1}{e^\eta -1}$$, resulting in \begin{align*} \Psi(\eta) &~=~ - 1 + \frac{\eta}{e^{\eta }-1} - \ln \frac{\eta }{e^{\eta }-1} \end{align*} A series expansion reveals that $$\Psi(\eta) \approx \frac{\eta^2}{8}$$ around $$\eta = 0$$. It turns out (see e.g.  Proposition 4 in this earlier post) that this is an actual upper bound: $\Psi(\eta) ~\le~ \frac{\eta^2}{8} \qquad \text{for all \eta \in \mathbb R} .$ This allows us to conclude that any Bernoulli variable is sub-Gaussian with variance factor $$\nu = \frac{1}{4}$$.

Note that a careful application of Hoeffding’s Inequality (Cesa-Bianchi and Lugosi 2006 Lemma A.1.1) would also give this. We would observe that $$X-\theta \in [0-\theta, 1-\theta]$$ and hence the width of the range is one. (The less careful application would say $$X-\theta \in [-1,1]$$ of width two, and the $$\frac{1}{8}$$ would become $$\frac{1}{2}$$, resulting in variance factor of only $$\nu=1$$.)

Cesa-Bianchi, Nicolò, and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press.