Bernoulli is Sub-Gaussian

Wouter M. Koolen

2017-09-08

Abstract

We show that any Bernoulli random variable is sub-Gaussian with variance factor $\frac{1}{4}$.

Introduction

A random variable $X \sim \mathop{\mathrm{\mathbb P}}$ is sub-Gaussian with variance factor $\nu$ if \[\ln \mathop{\mathrm{\mathbb E}}\left[e^{\eta(X-\mathop{\mathrm{\mathbb E}}[X])}\right] ~\le~ \frac{\eta^2 \nu}{2} \qquad \text{for all $\eta \in \mathbb R$} .\] The reason for the name is that Gaussians satisfy this with equality. In this post we consider Bernoulli $X \sim \mathcal B(\theta)$ (so $X=1$ with probability $\theta$ and $X=0$ with probability $1-\theta$). Then $\mathop{\mathrm{\mathbb E}}[X] = \theta$. We find \begin{align} \notag \Psi_\theta(\eta) &~:=~ \ln \mathop{\mathrm{\mathbb E}}\left[e^{\eta(X-\theta)}\right] \\ \notag &~=~ \ln \left( \theta e^{\eta(1-\theta)} + (1-\theta) e^{\eta(0-\theta)} \right) \\ \label{eq:show.ccv} &~=~ - \eta \theta + \ln \left( 1 + \theta (e^\eta-1) \right) \end{align} To get a bound that does not depend on $\theta$, let us look at \[\Psi(\eta) ~:=~ \max_{\theta \in [0,1]} \Psi_\theta(\eta) .\] The expression $\eqref{eq:show.ccv}$ for $\Psi_\theta(\eta)$ implies that it is concave in $\theta$. So it is maximised at zero derivative, i.e. $\theta = % \frac{e^{\eta }-\eta-1}{\left(e^\eta -1\right) \eta } = \frac{1}{\eta } - \frac{1}{e^\eta -1}$, resulting in \begin{align*} \Psi(\eta) &~=~ - 1 + \frac{\eta}{e^{\eta }-1} - \ln \frac{\eta }{e^{\eta }-1} \end{align*} A series expansion reveals that $\Psi(\eta) \approx \frac{\eta^2}{8}$ around $\eta = 0$. It turns out (see e.g. Proposition 4 in this earlier post) that this is an actual upper bound: \[\Psi(\eta) ~\le~ \frac{\eta^2}{8} \qquad \text{for all $\eta \in \mathbb R$} .\] This allows us to conclude that any Bernoulli variable is sub-Gaussian with variance factor $\nu = \frac{1}{4}$.

Note that a careful application of Hoeffding’s Inequality (Cesa-Bianchi and Lugosi 2006, Lemma A.1.1) would also give this. We would observe that $X-\theta \in [0-\theta, 1-\theta]$ and hence the width of the range is one. (The less careful application would say $X-\theta \in [-1,1]$ of width two, and the $\frac{1}{8}$ would become $\frac{1}{2}$, resulting in variance factor of only $\nu=1$.)

Cesa-Bianchi, Nicolò, and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press.