A Quick and Dirty Finite Time Law of the Iterated Logarithm Result

Wouter M. Koolen

2017-04-26

Abstract

The Law of the Iterated Logarithm is an asymptotic uniform deviation bound. We prove a finite time version using martingale arguments.

Introduction

Let $X_1, X_2, \ldots$ be i.i.d. random variables with zero mean and unit variance. Let $S_t = \sum_{i=1}^t X_i$. The Law of the Iterated Logarithm says that \[\mathop{\lim\sup}_{t \to \infty}~ \frac{S_t}{\sqrt{t \ln \ln t}} ~=~ \sqrt{2} \qquad \text{almost surely}.\] In this post we look at a finite time version of this statement. We try to keep it simple and focus on the basics. To bring out one of the central ideas of the argument I will prove the following for sub-Guassian variables (defined below).

Proposition 1. Let $X_1,X_2,\ldots$ be i.i.d. sub-Guassian. For each $\delta \in (0,1)$, we have \[\mathop{\mathrm{\mathbb P}}\left( \exists t : S_t ~\ge~ \sqrt{2.07 t \ln \frac{(2+\log_2 t)^2}{\delta}} \right) ~\le~ \delta\]

See (Balsubramani 2014) for much more refined statements and techniques. For this post I am particularly interested in the method of proof, which is based on martingales. But let’s start with defining sub-Gaussian random variables.

Sub-Gaussian

A random variable $X \sim \mathop{\mathrm{\mathbb P}}$ is sub-Gaussian with variance factor $\nu \ge 0$ if \begin{equation}\label{eq:subgaussian} \mathop{\mathrm{\mathbb E}}\left[e^{\eta(X-\mathop{\mathrm{\mathbb E}}[X])}\right] ~\le~ e^{\frac{\eta^2 \nu}{2}} \qquad \text{for all $\eta \in \mathbb R$}. \end{equation} The two main examples we consider here are Gaussian and Bernoulli variables. If $X \sim \mathcal N(\mu, \sigma^2)$, then indeed $\eqref{eq:subgaussian}$ holds with $\nu = \sigma^2$. If $X \sim \mathcal B(\theta)$ then $\eqref{eq:subgaussian}$ holds with $\nu = \frac{1}{4}$. For the latter see e.g. this post.

Building a Super-Martingale

Let the sequence of random variables $X_1, X_2, \ldots$ be i.i.d. sub-Gaussian with variance factor $\nu$ and common mean $\mathop{\mathrm{\mathbb E}}[X_i] = \mu$. Let $S_t = \sum_{i=1}^t (X_i - \mu)$ be the centred sum of the first $t$ outcomes. Let $\gamma$ be a probability distribution on $\eta \in \mathbb R$. Consider \[Z_t ~=~ \int \gamma(\eta) e^{\eta S_t - t \frac{\nu \eta^2}{2}} \textrm{d}\eta\] First of all, $Z_t$ is a super-martingale (under $\mathop{\mathrm{\mathbb P}}$). Why? Well, \begin{align*} \mathop{\mathrm{\mathbb E}}\left[Z_{t+1}\middle|X^t\right] &~=~ \mathop{\mathrm{\mathbb E}}\left[ \int \gamma(\eta) e^{\eta (S_t + X_{t+1} - \mu) - \frac{t+1}{2} \nu \eta^2} \textrm{d}\eta\middle|X^t\right] \\ &~=~ \int \gamma(\eta) e^{\eta S_t - \frac{t}{2} \nu \eta^2} \mathop{\mathrm{\mathbb E}}\left[e^{\eta (X_{t+1} - \mu) - \frac{\nu \eta^2}{2}} \middle|X^t\right] \textrm{d}\eta \\ &~\stackrel{\eqref{eq:subgaussian}}{\le}~ \int \gamma(\eta) e^{\eta S_t - t \frac{\nu \eta^2}{2}} \textrm{d}\eta ~=~ Z_t \end{align*} It is obvious that $Z_t$ is non-negative and $Z_0 = 1$. It then follows, by Doob’s inequality, that \begin{equation}\label{eq:doob} \mathop{\mathrm{\mathbb P}}\left( \exists t : Z_t ~\ge~ \frac{1}{\delta} \right) \le \delta . \end{equation} This result will be the basis of the quantification over $t$ inside the event.

Picking the prior

We now consider the choice of prior $\gamma$. The main guiding principle is convenience of analysis, so we pick a discrete prior. Let us put prior mass $\gamma_i$ on $\eta_i$ for $i = 1, 2, \ldots$ Interestingly, it turns out to be handy to use $\delta$ in the construction of $\gamma$. (This would seem to violate Bayesian philosophy, but that does not concern us here.) The specific prior that we will consider is \[\gamma_i ~=~ \frac{1}{i(i+1)} \qquad \text{and} \qquad \eta_i ~=~ \sqrt{\frac{2 \ln \frac{1}{\gamma_i \delta}}{2^i \nu}} .\] Now since \[Z_t ~=~ \sum_i \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} ~\ge~ \max_i~ \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} .\] we have \begin{align*} \left\{Z_t \ge \frac{1}{\delta}\right\} &~\supseteq~ \left\{\max_i \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} \ge \frac{1}{\delta}\right\} \\ &~=~ \left\{ \frac{S_t}{t} \ge \min_i \frac{\nu \eta_i}{2} + \frac{1}{\eta_i t} \ln \frac{1}{\gamma_i \delta} \right\} \\ &~=~ \left\{ \frac{S_t}{t} \ge \min_i \left( \sqrt{\frac{t}{2^i}} + \sqrt{\frac{2^i}{t}} \right)\sqrt{\frac{\nu \ln \frac{1}{\gamma_i \delta}}{2 t}} \right\} \end{align*} Writing $\hat \mu_t - \mu = \frac{S_t}{t}$, it follows from $\eqref{eq:doob}$ that \[\mathop{\mathrm{\mathbb P}}\left( \exists t : \hat \mu_t - \mu ~\ge~ \min_i \left( \sqrt{\frac{t}{2^i}} + \sqrt{\frac{2^i}{t}} \right) \sqrt{\frac{\nu \ln \frac{i(i+1)}{\delta}}{2 t}} \right) ~\le~ \delta .\] It remains to choose $i$ to minimise the right-hand side.

Picking $i$

Let us first consider what would happen if we ignore rounding and pick $i = \log_2 t$. This would read \[\hat \mu_t - \mu ~\ge~ \sqrt{\frac{2 \nu \ln \frac{(1+\log_2 t)^2}{\delta}}{t}}\] So how to pick $i$ when we do enforce rounding?

Ceiling

If we use $i \in \log_2 t + [0,1]$ i.e. we overshoot by at most $1$, we find that our leading factor can be at most as bad as \[\sqrt{9/4 \frac{\nu}{t} \ln \dots}\]

Rounding

If we use $i \in \log_2 t + [-1/2, 1/2]$, we find leading factor \[\sqrt{(1 + \sqrt{9/8})\frac{\nu}{t} \ln \dots}\] In all cases we are very close to optimal. Note that $1 + \sqrt{9/8} \approx 2.0607$.

Two-sided version

A simple two-sided argument (each with half the $\delta$) also shows that \[\mathop{\mathrm{\mathbb P}}\left( |\hat \mu_t - \mu| ~\ge~ \sqrt{\frac{2.07 \nu \ln \frac{2 (1+\log_2 t)^2}{\delta}}{t}} \right) ~\le~ \delta .\]

Balsubramani, Akshay. 2014. “Sharp Uniform Martingale Concentration Bounds.” CoRR abs/1405.2639. http://arxiv.org/abs/1405.2639.