# Introduction

Let $$X_1, X_2, \ldots$$ be i.i.d. random variables with zero mean and unit variance. Let $$S_t = \sum_{i=1}^t X_i$$. The Law of the Iterated Logarithm says that $\mathop{\lim\sup}_{t \to \infty}~ \frac{S_t}{\sqrt{t \ln \ln t}} ~=~ \sqrt{2} \qquad \text{almost surely}.$ In this post we look at a finite time version of this statement. We try to keep it simple and focus on the basics. To bring out one of the central ideas of the argument I will prove the following for sub-Guassian variables (defined below).

Proposition 1. Let $$X_1,X_2,\ldots$$ be i.i.d. sub-Guassian. For each $$\delta \in (0,1)$$, we have $\operatorname*{\mathbb P}{\left( \exists t : S_t ~\ge~ \sqrt{2.07 t \ln \frac{(2+\log_2 t)^2}{\delta}} \right)} ~\le~ \delta$

See (Balsubramani 2014) for much more refined statements and techniques. For this post I am particularly interested in the method of proof, which is based on martingales. But let’s start with defining sub-Gaussian random variables.

# Sub-Gaussian

A random variable $$X \sim \operatorname*{\mathbb P}$$ is sub-Gaussian with variance factor $$\nu \ge 0$$ if $$\label{eq:subgaussian} \operatorname*{\mathbb E}{\left[e^{\eta(X-\operatorname*{\mathbb E}[X])}\right]} ~\le~ e^{\frac{\eta^2 \nu}{2}} \qquad \text{for all \eta \in \mathbb R}.$$ The two main examples we consider here are Gaussian and Bernoulli variables. If $$X \sim \mathcal N(\mu, \sigma^2)$$, then indeed $$\eqref{eq:subgaussian}$$ holds with $$\nu = \sigma^2$$. If $$X \sim \mathcal B(\theta)$$ then $$\eqref{eq:subgaussian}$$ holds with $$\nu = \frac{1}{4}$$. For the latter see e.g. this post.

# Building a Super-Martingale

Let the sequence of random variables $$X_1, X_2, \ldots$$ be i.i.d. sub-Gaussian with variance factor $$\nu$$ and common mean $$\operatorname*{\mathbb E}[X_i] = \mu$$. Let $$S_t = \sum_{i=1}^t (X_i - \mu)$$ be the centred sum of the first $$t$$ outcomes. Let $$\gamma$$ be a probability distribution on $$\eta \in \mathbb R$$. Consider $Z_t ~=~ \int \gamma(\eta) e^{\eta S_t - t \frac{\nu \eta^2}{2}} {\textrm{d}}\eta$ First of all, $$Z_t$$ is a super-martingale (under $$\operatorname*{\mathbb P}$$). Why? Well, \begin{align*} \operatorname*{\mathbb E}{\left[Z_{t+1}\middle|X^t\right]} &~=~ \operatorname*{\mathbb E}{\left[ \int \gamma(\eta) e^{\eta (S_t + X_{t+1} - \mu) - \frac{t+1}{2} \nu \eta^2} {\textrm{d}}\eta\middle|X^t\right]} \\ &~=~ \int \gamma(\eta) e^{\eta S_t - \frac{t}{2} \nu \eta^2} \operatorname*{\mathbb E}{\left[e^{\eta (X_{t+1} - \mu) - \frac{\nu \eta^2}{2}} \middle|X^t\right]} {\textrm{d}}\eta \\ &~\stackrel{\eqref{eq:subgaussian}}{\le}~ \int \gamma(\eta) e^{\eta S_t - t \frac{\nu \eta^2}{2}} {\textrm{d}}\eta ~=~ Z_t \end{align*} It is obvious that $$Z_t$$ is non-negative and $$Z_0 = 1$$. It then follows, by Doob’s inequality, that $$\label{eq:doob} \operatorname*{\mathbb P}{\left( \exists t : Z_t ~\ge~ \frac{1}{\delta} \right)} \le \delta .$$ This result will be the basis of the quantification over $$t$$ inside the event.

# Picking the prior

We now consider the choice of prior $$\gamma$$. The main guiding principle is convenience of analysis, so we pick a discrete prior. Let us put prior mass $$\gamma_i$$ on $$\eta_i$$ for $$i = 1, 2, \ldots$$ Interestingly, it turns out to be handy to use $$\delta$$ in the construction of $$\gamma$$. (This would seem to violate Bayesian philosophy, but that does not concern us here.) The specific prior that we will consider is $\gamma_i ~=~ \frac{1}{i(i+1)} \qquad \text{and} \qquad \eta_i ~=~ \sqrt{\frac{2 \ln \frac{1}{\gamma_i \delta}}{2^i \nu}} .$ Now since $Z_t ~=~ \sum_i \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} ~\ge~ \max_i~ \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} .$ we have \begin{align*} {\left\{Z_t \ge \frac{1}{\delta}\right\}} &~\supseteq~ {\left\{\max_i \gamma_ie^{\eta_i S_t - t \frac{\nu \eta_i^2}{2}} \ge \frac{1}{\delta}\right\}} \\ &~=~ {\left\{ \frac{S_t}{t} \ge \min_i \frac{\nu \eta_i}{2} + \frac{1}{\eta_i t} \ln \frac{1}{\gamma_i \delta} \right\}} \\ &~=~ {\left\{ \frac{S_t}{t} \ge \min_i {\left( \sqrt{\frac{t}{2^i}} + \sqrt{\frac{2^i}{t}} \right)}\sqrt{\frac{\nu \ln \frac{1}{\gamma_i \delta}}{2 t}} \right\}} \end{align*} Writing $$\hat \mu_t - \mu = \frac{S_t}{t}$$, it follows from $$\eqref{eq:doob}$$ that $\operatorname*{\mathbb P}{\left( \exists t : \hat \mu_t - \mu ~\ge~ \min_i {\left( \sqrt{\frac{t}{2^i}} + \sqrt{\frac{2^i}{t}} \right)} \sqrt{\frac{\nu \ln \frac{i(i+1)}{\delta}}{2 t}} \right)} ~\le~ \delta .$ It remains to choose $$i$$ to minimise the right-hand side.

# Picking $$i$$

Let us first consider what would happen if we ignore rounding and pick $$i = \log_2 t$$. This would read $\hat \mu_t - \mu ~\ge~ \sqrt{\frac{2 \nu \ln \frac{(1+\log_2 t)^2}{\delta}}{t}}$ So how to pick $$i$$ when we do enforce rounding?

## Ceiling

If we use $$i \in \log_2 t + [0,1]$$ i.e. we overshoot by at most $$1$$, we find that our leading factor can be at most as bad as $\sqrt{9/4 \frac{\nu}{t} \ln \dots}$

## Rounding

If we use $$i \in \log_2 t + [-1/2, 1/2]$$, we find leading factor $\sqrt{(1 + \sqrt{9/8})\frac{\nu}{t} \ln \dots}$ In all cases we are very close to optimal. Note that $$1 + \sqrt{9/8} \approx 2.0607$$.

## Two-sided version

A simple two-sided argument (each with half the $$\delta$$) also shows that $\operatorname*{\mathbb P}{\left( {|\hat \mu_t - \mu|} ~\ge~ \sqrt{\frac{2.07 \nu \ln \frac{2 (1+\log_2 t)^2}{\delta}}{t}} \right)} ~\le~ \delta .$

Balsubramani, Akshay. 2014. “Sharp Uniform Martingale Concentration Bounds.” CoRR abs/1405.2639. http://arxiv.org/abs/1405.2639.