2017-04-26
The Law of the Iterated Logarithm is an asymptotic uniform deviation bound. We prove a finite time version using martingale arguments.
Let \(X_1, X_2, \ldots\) be i.i.d. random variables with zero mean and unit variance. Let \(S_t = \sum_{i=1}^t X_i\). The Law of the Iterated Logarithm says that \[\mathop{\lim\sup}_{t \to \infty}~ \frac{S_t}{\sqrt{t \ln \ln t}} ~=~ \sqrt{2} \qquad \text{almost surely}.\] In this post we look at a finite time version of this statement. We try to keep it simple and focus on the basics. To bring out one of the central ideas of the argument I will prove the following for sub-Guassian variables (defined below).
Proposition 1. Let \(X_1,X_2,\ldots\) be i.i.d. sub-Guassian. For each \(\delta \in (0,1)\), we have \[\mathop{\mathrm{\mathbb P}}\left( \exists t : S_t ~\ge~ \sqrt{2.07 t \ln \frac{(2+\log_2 t)^2}{\delta}} \right) ~\le~ \delta\]
See (Balsubramani 2014) for much more refined statements and techniques. For this post I am particularly interested in the method of proof, which is based on martingales. But let’s start with defining sub-Gaussian random variables.
A random variable \(X \sim \mathop{\mathrm{\mathbb P}}\) is sub-Gaussian with variance factor \(\nu \ge 0\) if \begin{equation}\label{eq:subgaussian} \mathop{\mathrm{\mathbb E}}\left[e^{\eta(X-\mathop{\mathrm{\mathbb E}}[X])}\right] ~\le~ e^{\frac{\eta^2 \nu}{2}} \qquad \text{for all $\eta \in \mathbb R$}. \end{equation} The two main examples we consider here are Gaussian and Bernoulli variables. If \(X \sim \mathcal N(\mu, \sigma^2)\), then indeed \(\eqref{eq:subgaussian}\) holds with \(\nu = \sigma^2\). If \(X \sim \mathcal B(\theta)\) then \(\eqref{eq:subgaussian}\) holds with \(\nu = \frac{1}{4}\). For the latter see e.g. this post.
Let the sequence of random variables \(X_1, X_2, \ldots\) be i.i.d. sub-Gaussian with variance factor \(\nu\) and common mean \(\mathop{\mathrm{\mathbb E}}[X_i] = \mu\). Let \(S_t = \sum_{i=1}^t (X_i - \mu)\) be the centred sum of the first \(t\) outcomes. Let \(\gamma\) be a probability distribution on \(\eta \in \mathbb R\). Consider \[Z_t ~=~ \int \gamma(\eta) e^{\eta S_t - t \frac{\nu \eta^2}{2}} \textrm{d}\eta\] First of all, \(Z_t\) is a super-martingale (under \(\mathop{\mathrm{\mathbb P}}\)). Why? Well, \begin{align*} \mathop{\mathrm{\mathbb E}}\left[Z_{t+1}\middle|X^t\right] &~=~ \mathop{\mathrm{\mathbb E}}\left[ \int \gamma(\eta) e^{\eta (S_t + X_{t+1} - \mu) - \frac{t+1}{2} \nu \eta^2} \textrm{d}\eta\middle|X^t\right] \\ &~=~ \int \gamma(\eta) e^{\eta S_t - \frac{t}{2} \nu \eta^2} \mathop{\mathrm{\mathbb E}}\left[e^{\eta (X_{t+1} - \mu) - \frac{\nu \eta^2}{2}} \middle|X^t\right] \textrm{d}\eta \\ &~\stackrel{\eqref{eq:subgaussian}}{\le}~ \int \gamma(\eta) e^{\eta S_t - t \frac{\nu \eta^2}{2}} \textrm{d}\eta ~=~ Z_t \end{align*} It is obvious that \(Z_t\) is non-negative and \(Z_0 = 1\). It then follows, by Doob’s inequality, that \begin{equation}\label{eq:doob} \mathop{\mathrm{\mathbb P}}\left( \exists t : Z_t ~\ge~ \frac{1}{\delta} \right) \le \delta . \end{equation} This result will be the basis of the quantification over \(t\) inside the event.
We now consider the choice of prior \(\gamma\). The main guiding principle is convenience of analysis, so we pick a discrete prior. Let us put prior mass \(\gamma_i\) on \(\eta_i\) for \(i = 1, 2, \ldots\) Interestingly, it turns out to be handy to use \(\delta\) in the construction of \(\gamma\). (This would seem to violate Bayesian philosophy, but that does not concern us here.) The specific prior that we will consider is \[\gamma_i ~=~ \frac{1}{i(i+1)} \qquad \text{and} \qquad \eta_i ~=~ \sqrt{\frac{2 \ln \frac{1}{\gamma_i \delta}}{2^i \nu}} .\] Now since \[Z_t ~=~ \sum_i \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} ~\ge~ \max_i~ \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} .\] we have \begin{align*} \left\{Z_t \ge \frac{1}{\delta}\right\} &~\supseteq~ \left\{\max_i \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} \ge \frac{1}{\delta}\right\} \\ &~=~ \left\{ \frac{S_t}{t} \ge \min_i \frac{\nu \eta_i}{2} + \frac{1}{\eta_i t} \ln \frac{1}{\gamma_i \delta} \right\} \\ &~=~ \left\{ \frac{S_t}{t} \ge \min_i \left( \sqrt{\frac{t}{2^i}} + \sqrt{\frac{2^i}{t}} \right)\sqrt{\frac{\nu \ln \frac{1}{\gamma_i \delta}}{2 t}} \right\} \end{align*} Writing \(\hat \mu_t - \mu = \frac{S_t}{t}\), it follows from \(\eqref{eq:doob}\) that \[\mathop{\mathrm{\mathbb P}}\left( \exists t : \hat \mu_t - \mu ~\ge~ \min_i \left( \sqrt{\frac{t}{2^i}} + \sqrt{\frac{2^i}{t}} \right) \sqrt{\frac{\nu \ln \frac{i(i+1)}{\delta}}{2 t}} \right) ~\le~ \delta .\] It remains to choose \(i\) to minimise the right-hand side.
Let us first consider what would happen if we ignore rounding and pick \(i = \log_2 t\). This would read \[\hat \mu_t - \mu ~\ge~ \sqrt{\frac{2 \nu \ln \frac{(1+\log_2 t)^2}{\delta}}{t}}\] So how to pick \(i\) when we do enforce rounding?
If we use \(i \in \log_2 t + [0,1]\) i.e. we overshoot by at most \(1\), we find that our leading factor can be at most as bad as \[\sqrt{9/4 \frac{\nu}{t} \ln \dots}\]
If we use \(i \in \log_2 t + [-1/2, 1/2]\), we find leading factor \[\sqrt{(1 + \sqrt{9/8})\frac{\nu}{t} \ln \dots}\] In all cases we are very close to optimal. Note that \(1 + \sqrt{9/8} \approx 2.0607\).
A simple two-sided argument (each with half the \(\delta\)) also shows that \[\mathop{\mathrm{\mathbb P}}\left( |\hat \mu_t - \mu| ~\ge~ \sqrt{\frac{2.07 \nu \ln \frac{2 (1+\log_2 t)^2}{\delta}}{t}} \right) ~\le~ \delta .\]