2017-09-08

Abstract

We show that any Bernoulli random variable is sub-Gaussian with variance factor \(\frac{1}{4}\).

A random variable \(X \sim \mathop{\mathrm{\mathbb P}}\) is *sub-Gaussian* with variance factor \(\nu\) if \[\ln \mathop{\mathrm{\mathbb E}}\left[e^{\eta(X-\mathop{\mathrm{\mathbb E}}[X])}\right] ~\le~ \frac{\eta^2 \nu}{2} \qquad \text{for all $\eta \in \mathbb R$}
.\] The reason for the name is that Gaussians satisfy this with equality. In this post we consider Bernoulli \(X \sim \mathcal B(\theta)\) (so \(X=1\) with probability \(\theta\) and \(X=0\) with probability \(1-\theta\)). Then \(\mathop{\mathrm{\mathbb E}}[X] = \theta\). We find \begin{align}
\notag
\Psi_\theta(\eta)
&~:=~
\ln \mathop{\mathrm{\mathbb E}}\left[e^{\eta(X-\theta)}\right]
\\
\notag
&~=~
\ln \left(
\theta e^{\eta(1-\theta)}
+
(1-\theta) e^{\eta(0-\theta)}
\right)
\\
\label{eq:show.ccv}
&~=~
- \eta \theta +
\ln \left(
1 + \theta (e^\eta-1)
\right) \end{align} To get a bound that does not depend on \(\theta\), let us look at \[\Psi(\eta)
~:=~
\max_{\theta \in [0,1]} \Psi_\theta(\eta)
.\] The expression \(\eqref{eq:show.ccv}\) for \(\Psi_\theta(\eta)\) implies that it is concave in \(\theta\). So it is maximised at zero derivative, i.e. \(\theta =
% \frac{e^{\eta }-\eta-1}{\left(e^\eta -1\right) \eta } =
\frac{1}{\eta } - \frac{1}{e^\eta -1}\), resulting in \begin{align*}
\Psi(\eta)
&~=~
- 1
+ \frac{\eta}{e^{\eta }-1}
- \ln \frac{\eta }{e^{\eta }-1}
\end{align*} A series expansion reveals that \(\Psi(\eta) \approx \frac{\eta^2}{8}\) around \(\eta = 0\). It turns out (see e.g. Proposition 4 in this earlier post) that this is an actual upper bound: \[\Psi(\eta) ~\le~ \frac{\eta^2}{8}
\qquad
\text{for all $\eta \in \mathbb R$}
.\] This allows us to conclude that any Bernoulli variable is sub-Gaussian with variance factor \(\nu = \frac{1}{4}\).

Note that a careful application of Hoeffding’s Inequality (Cesa-Bianchi and Lugosi 2006, Lemma A.1.1) would also give this. We would observe that \(X-\theta \in [0-\theta, 1-\theta]\) and hence the width of the range is one. (The less careful application would say \(X-\theta \in [-1,1]\) of width two, and the \(\frac{1}{8}\) would become \(\frac{1}{2}\), resulting in variance factor of only \(\nu=1\).)

Cesa-Bianchi, Nicolò, and Gábor Lugosi. 2006. *Prediction, Learning, and Games*. Cambridge University Press.