2021-11-07

# Introduction

In this post I describe and solve a problem that came up in a discussion with Shubhada Agrawal and Sandeep Juneja, both of TIFR Mumbai. We are interested in proving worst-case regret bounds. Traditional worst-case regret results bound the performance of a learner in terms of some baseline performance plus overhead. The idea here is to look instead at comparing to a scaled-down baseline. Ideally, this could make the problem easier, and hence the overhead smaller.

## Setting

We are looking at the $$K$$ experts mix gains setting with positive gains. More precisely, every round $$t=1,2,\ldots$$ the learner produces a distribution $${\boldsymbol w}_t \in \triangle_K$$, the adversary produces a gains vector $${\boldsymbol g}_t \in \mathbb R_+^K$$ and the learner receives the mix gain $\ln \left(\sum_{k=1}^K w_t^k e^{g_t^k}\right)$ After $$T$$ rounds, the regret w.r.t. the best expert is the difference between the maximum cumulative gain among the experts, and the learner’s gain. $R_T ~=~ \mathop{\mathrm{\vphantom{p}max}}_k \sum_{t=1}^T g_t^k - \sum_{t=1}^T \ln \left(\sum_{k=1}^K w_t^k e^{g_t^k}\right) .$ It is well known (and will also follow from the analysis below) that the worst-case regret is $$\ln K$$ for any horizon $$T \ge 1$$.

In this post we are looking at an easier problem, obtained by dampening the gain of the best expert by some factor $$\lambda \in [0,1]$$. That is, we consider the scaled-comparator regret $R_T^\lambda ~=~ \lambda \mathop{\mathrm{\vphantom{p}max}}_k \sum_{t=1}^T g_t^k - \sum_{t=1}^T \ln \left(\sum_{k=1}^K w_t^k e^{g_t^k}\right) .$ And in particular we are interested in the minimax regret, i.e. $V_T^\lambda ~=~ \Big\langle\mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_t \in \triangle_K} \sup_{{\boldsymbol g}_t \in \mathbb R_+^K}\Big\rangle_{t=1}^T~ R_T^\lambda % \setl{ % \lambda \max_k \sum_{t=1}^T \gain_t^k % - \sum_{t=1}^T \ln \dell{\sum_k w_t^k e^{\gain_t^k}} % }$

The main result of this post is

Proposition 1. The minimax regret of the $$K\ge 1$$ experts, $$T \ge 1$$ round scaled-comparator regret problem is $V_T^\lambda ~=~ \mathop{\mathrm{kl}}(\lambda, \lambda \wedge 1/K) ,$ where $$\mathop{\mathrm{kl}}(x,y) = x \ln \frac{x}{y} + (1-x) \ln \frac{1-x}{1-y}$$ is the Bernoulli relative entropy from $$x$$ to $$y$$, and $$x \wedge y = \mathop{\mathrm{\vphantom{p}min}}\left\{x,y\right\}$$ is the binary minimum.

A few remarks are in place.

• For $$\lambda=1$$ we recover the classical $$V_T^1 = \ln K$$ result, which holds for the exponential weights strategy. As we will see below, we will need a different strategy for $$\lambda <1$$.

• For $$\lambda \le 1/K$$, we find $$V_T^\lambda = 0$$. At first sight this seems counter intuitive. But on second thought, by Jensen’s inequality $\ln \left(\sum_{k=1}^K w_t^k e^{g_t^k}\right) ~\ge~ \sum_{k=1}^K w_t^k g_t^k$ so that by always playing uniform weights $$w_t^k = 1/K$$, the learner ensures gain at least that of the best expert divided by $$K$$. Hence the scaled-comparator regret is negative. On the other hand, the adversary can hand out no gain $${\boldsymbol g}_t = {\boldsymbol 0}$$ and ensure the regret is zero. All in all, indeed, $$V_T^\lambda = 0$$ for $$\lambda \le 1/K$$.

• As we will see, the strategy for the learner is independent of the horizon $$T$$. The learner can hence ensure that $$R_T^\lambda \le \mathop{\mathrm{kl}}(\lambda, \lambda \wedge 1/K)$$ at any time $$T = 1, 2, \ldots$$.

# Proof of Proposition 1

Recall that we are after the $$T$$-round minimax scaled-comparator regret $V_T^\lambda ~:=~ \left\langle\mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_t \in \triangle_K} \sup_{{\boldsymbol g}_t \in \mathbb R_+^K}\right\rangle_{t=1}^T \left\{ \lambda \mathop{\mathrm{\vphantom{p}max}}_k \sum_{t=1}^T g_t^k - \sum_{t=1}^T \ln \left(\sum_k w_t^k e^{g_t^k}\right) \right\} .$ Let us introduce the following shorthand. After having played $$t<T$$ rounds with cumulative expert gains $${\boldsymbol G}_t = \sum_{s=1}^t {\boldsymbol g}_s$$, let us define the value-to-go of the remaining $$T-t \ge 1$$ rounds as \begin{align} \label{eq:vtg} V_{t,T}^\lambda({\boldsymbol G}_t) ~=~ \left\langle\mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_s \in \triangle_K} \sup_{{\boldsymbol g}_s \in \mathbb R_+^K}\right\rangle_{s=t+1}^T \left\{ \lambda \mathop{\mathrm{\vphantom{p}max}}_k \left(G_t^k + \sum_{s=t+1}^T g_s^k\right) - \sum_{s=t+1}^T \ln \left(\sum_k w_s^k e^{g_s^k}\right) \right\} .\end{align} We will show the following.

Proposition 2. The value-to-go $$\eqref{eq:vtg}$$ satisfies $$V_{t,T}^\lambda({\boldsymbol G}_T) = V^\lambda({\boldsymbol G}_t)$$ where \begin{align} \label{eq:primal} V^\lambda({\boldsymbol G}) ~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \left\{ \lambda G^k + \mathop{\mathrm{kl}}(\lambda, \lambda \wedge w^k) \right\} \end{align}

Proposition 2 is proved in the next two subsections. Proposition 1 follows from it by considering $$t=0$$ where $${\boldsymbol G}_0 = {\boldsymbol 0}$$. For, by symmetry, $$V^\lambda({\boldsymbol 0}) = \mathop{\mathrm{kl}}(\lambda, \lambda \wedge 1/K)$$ as desired.

Note how Proposition 2 implies that all dependence on the past rounds is summarised purely by $${\boldsymbol G}$$, while the number of future rounds remaining is immaterial.

We will also prove and use the alternative expressions: \begin{align} V^\lambda({\boldsymbol G}) &~=~ \label{eq:sec} \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \sup_{{\boldsymbol g}\in \mathbb R_+^K} \left\{ \lambda (G_k + g_k) - \ln \left(\sum_k w_k e^{g_k}\right) \right\} \\ \label{eq:entropy} &~=~ \mathop{\mathrm{\vphantom{p}max}}_{{\boldsymbol q}\in \triangle_K}~ \lambda \sum_k q_k G^k + \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \sum_k q_k \mathop{\mathrm{kl}}(\lambda, \lambda \wedge w^k) \\ \label{eq:ih} &~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \sup_{{\boldsymbol g}\in \mathbb R_+^K}\left\{ V^\lambda({\boldsymbol G}+{\boldsymbol g}) - \ln \left(\sum_k w^k e^{g^k}\right) \right\}\end{align}

The expressions $$\eqref{eq:sec}$$ and $$\eqref{eq:ih}$$ will be proved in the base case and induction steps respectively. The expression $$\eqref{eq:entropy}$$ is provided here purely for insight. It is obtained from $$\eqref{eq:primal}$$ by convexifying the inner problem $$\mathop{\mathrm{\vphantom{p}max}}_k \cdot$$ to $$\mathop{\mathrm{\vphantom{p}max}}_{{\boldsymbol q}\in \triangle_K} \operatorname*{\mathbb E}_{k \sim {\boldsymbol q}}\left[\cdot\right]$$ and swapping $$\mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}}$$ and $$\mathop{\mathrm{\vphantom{p}max}}_{{\boldsymbol q}}$$. It reveals that $$V^\lambda({\boldsymbol G})$$ is convex in $${\boldsymbol G}$$, as it is a max of linear functions. Moreover, it is increasing in each entry of $${\boldsymbol G}$$, and transmits additive constants scaled by $$\lambda$$, i.e. $$V^\lambda({\boldsymbol G}+ c {\boldsymbol 1}) = V^\lambda({\boldsymbol G}) + \lambda c$$. Finally, it establishes $$V^\lambda({\boldsymbol G})$$ as a regularised maximum of $$\lambda {\boldsymbol G}$$, with the regulariser (or “entropy”) defined on the probability simplex by ${\boldsymbol q}~\mapsto~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \sum_k q_k \mathop{\mathrm{kl}}(\lambda, \lambda \wedge w^k) .$

## Proposition 2: Base case

Let’s start at the end, with the last round. By definition $$\eqref{eq:vtg}$$, $V_{T-1,T}^\lambda({\boldsymbol G}_{T-1}) ~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_T \in \triangle_K} \sup_{{\boldsymbol g}_T \in \mathbb R_+^K} \left\{ \lambda \mathop{\mathrm{\vphantom{p}max}}_k (G_{T-1}^k + g_T^k) - \ln \left(\sum_k w_T^k e^{g_T^k}\right) \right\}$ taking the $$\mathop{\mathrm{\vphantom{p}max}}_k$$ to the front results in $V_{T-1,T}^\lambda({\boldsymbol G}_{T-1}) ~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_T \in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \sup_{{\boldsymbol g}_T \in \mathbb R_+^K} \left\{ \lambda (G_{T-1}^k + g_T^k) - \ln \left(\sum_k w_T^k e^{g_T^k}\right) \right\}$ which, upon proving $$\eqref{eq:primal}$$ below, proves $$\eqref{eq:sec}$$. Plugging in the worst-case $$g_T^j=0$$ for $$j \neq k$$ further results in $V_{T-1,T}^\lambda({\boldsymbol G}_{T-1}) ~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_T \in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \sup_{g_T^k \in \mathbb R_+} \left\{ \lambda (G_{T-1}^k + g_T^k) - \ln \left(w_T^k e^{g_T^k} + (1-w_T^k)\right) \right\}$ The best choice for $$g_T^k$$ is $g_T^k ~=~ \mathop{\mathrm{\vphantom{p}max}}\left\{ 0, \ln \frac{ \lambda (1-w_T^k) }{ (1-\lambda) w_T^k } \right\}$ at which point the inner objective becomes $$\mathop{\mathrm{kl}}(\lambda, \lambda \wedge w_T^k)$$. We hence end up with $V_{T-1,T}^\lambda({\boldsymbol G}_{T-1}) ~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_T \in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \left\{ \lambda G_{T-1}^k + \mathop{\mathrm{kl}}(\lambda, \lambda \wedge w_T^k) \right\}$ which finally establishes the desired form $$\eqref{eq:primal}$$.

## Proposition 2: Induction step

Unpacking the definition of value-to-go $$\eqref{eq:vtg}$$, we find \begin{align*} V_{t-1,T}^\lambda({\boldsymbol G}_{t-1}) % &~=~ % \min_{\w_t \in \triangle_K} \sup_{\vgain_t \in \mathbb R_+^K} % \tuplel{\min_{\w_s \in \triangle_K} \sup_{\vgain_s \in \mathbb R_+^K}}_{s=t+1}^T % \setl{ % \lambda \max_k \dell{G_{t-1}^k + \sum_{s=t}^T \gain_s^k} % - \sum_{s=t+1}^T \ln \dell{\sum_k w_s^k e^{\gain_s^k}} % } % - \ln \dell{\sum_k w_t^k e^{\gain_t^k}} % \\ &~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}_t \in \triangle_K} \sup_{{\boldsymbol g}_t \in \mathbb R_+^K} V_{t,T}^\lambda({\boldsymbol G}_{t-1} + {\boldsymbol g}_t) - \ln \left(\sum_k w_t^k e^{g_t^k}\right) \end{align*} To show that $$V_{t-1,T}^\lambda({\boldsymbol G}_{t-1}) = V^\lambda({\boldsymbol G}_{t-1})$$, it hence remains to show the “minimax pullback” property $$\eqref{eq:ih}$$. We will prove both inequalities of $$\eqref{eq:ih}$$ separately. The case $$\ge$$ is obvious by taking $${\boldsymbol g}={\boldsymbol 0}$$. For $$\le$$, we expand the definition of $$V^\lambda$$ \begin{align*} & \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \sup_{{\boldsymbol g}\in \mathbb R_+^K}\left\{ V^\lambda({\boldsymbol G}+{\boldsymbol g}) - \ln \left(\sum_k w_k e^{g_k}\right) \right\} \\ &~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \sup_{{\boldsymbol g}\in \mathbb R_+^K}\left\{ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}' \in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \sup_{{\boldsymbol g}' \in \mathbb R_+^K} \left\{ \lambda (G_k + g_k + g'_k) - \ln \left(\sum_k w'_k e^{g'_k}\right) \right\} - \ln \left(\sum_k w_k e^{g_k}\right) \right\} \end{align*} Now if we take $${\boldsymbol w}'$$ to be the exponential weights update of prior $${\boldsymbol w}$$ encountering gain $${\boldsymbol g}$$, i.e. $$w'_k \propto w_k e^{g_k}$$, then we find \begin{align*} &~\le~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \sup_{{\boldsymbol g}\in \mathbb R_+^K}\left\{ \mathop{\mathrm{\vphantom{p}max}}_k \sup_{{\boldsymbol g}' \in \mathbb R_+^K} \left\{ \lambda (G_k + g_k + g'_k) - \ln \left(\sum_k \frac{w_k e^{g_k}}{\sum_j w^j e^{g^j}} e^{g'_k}\right) \right\} - \ln \left(\sum_k w_k e^{g_k}\right) \right\} \\ &~=~ \mathop{\mathrm{\vphantom{p}min}}_{{\boldsymbol w}\in \triangle_K} \mathop{\mathrm{\vphantom{p}max}}_k \sup_{{\boldsymbol g}, {\boldsymbol g}' \in \mathbb R_+^K} \left\{ \lambda (G_k + g_k + g'_k) - \ln \left(\sum_k w_k e^{g_k + g'_k}\right) \right\} \end{align*} Finally merging $${\boldsymbol g}+ {\boldsymbol g}' \in \mathbb R_+^K$$, we find, by $$\eqref{eq:sec}$$ \begin{align*} &~=~ V^\lambda({\boldsymbol G}) \end{align*} as desired.

# Discussion

We have quantified how the regret game gets easier when the comparator gain is scaled by a factor $$\lambda \in [0,1]$$. Interestingly, this crucially depends on the assumption that gains are positive. If gains may be negative as well, then scaling by a factor $$\lambda < 1$$ may make them higher, thus boosting the comparator gain. In turn, we find that for two-sided gains the minimax regret is infinite unless $$\lambda = 1$$. Yet another interesting combination is negative gains (which we may rephrase as positive losses) with scale factor $$\lambda \ge 1$$. Here we find that the minimax scaled-comparator regret is $$\ln K$$ for any $$\lambda \ge 1$$. Even though $$\lambda >1$$ does hamper the comparator, the game does not get any easier.