Minimax on the Ball

Wouter M. Koolen

2015-09-29

Abstract

We compare two minimax strategies on the ball.

Introduction

We consider minimax games on the ball. Each round, the learner plays a point \(\boldsymbol{a}\) with \(\left\|\boldsymbol{a}\right\| \le 1\), and the adversary plays a point \(\boldsymbol{x}\) with \(\left\|\boldsymbol{x}\right\| \le 1\). We consider two loss functions

Linear loss. The learner incurs \(\ell(\boldsymbol{a},\boldsymbol{x}) = \boldsymbol{a}^\intercal\boldsymbol{x}\).
Quadratic loss. The learner incurs \(\ell(\boldsymbol{a}, \boldsymbol{x}) = \left\|\boldsymbol{a}-\boldsymbol{x}\right\|^2\).

Consider a \(T\)-round game where the players played \(\boldsymbol{a}_1, \boldsymbol{x}_1, \ldots, \boldsymbol{a}_T, \boldsymbol{x}_T\). Then the cumulative loss of the learner is \[\sum_{t=1}^T \ell(\boldsymbol{a}_t, \boldsymbol{x}_t),\] and the cumulative loss of the best fixed action in hindsight is \[\min_{\boldsymbol{a}} \sum_{t=1}^T \ell(\boldsymbol{a}, \boldsymbol{x}_t) .\]

Best offline

The best action in hindsight is different for the two loss functions. In both cases, let us abbreviate \(\boldsymbol{s}_T = \sum_{t=1}^T \boldsymbol{x}_t\).

For linear loss the best action is \(\frac{-\boldsymbol{s}_T}{\left\|\boldsymbol{s}_T\right\|}\), that is, the point on the unit sphere least aligned with direction \(\boldsymbol{s}_T\).
For quadratic loss the best action is \(\frac{\boldsymbol{s}_T}{T}\), that is, the data mean.

Regret

We call regret the overhead of the Learner compared to the best action \[\text{Regret}_T ~=~ \sum_{t=1}^T \ell(\boldsymbol{a}_t, \boldsymbol{x}_t) - \min_{\boldsymbol{a}} \sum_{t=1}^T \ell(\boldsymbol{a}, \boldsymbol{x}_t) .\] The minimax regret of a \(T\)-round game is given by \[\min_{\boldsymbol{a}_1} \max_{\boldsymbol{x}_1} \cdots \min_{\boldsymbol{a}_T} \max_{\boldsymbol{x}_T} ~ \text{Regret}_T .\] The minimax algorithm for either game is known. In either case we abbreviate \(\boldsymbol{s}_t = \sum_{q=1}^t \boldsymbol{x}_q\).

For linear loss, the minimax strategy was found by (Abernethy et al. 2008). It predicts with \[\boldsymbol{a}_t ~=~ \frac{- \boldsymbol{s}_{t-1}}{\sqrt{\left\|\boldsymbol{s}_{t-1}\right\|^2 + (T-t+1)}}\]
For quadratic loss, the minimax strategy was found by (Takimoto and Warmuth 2000). It predicts with \[\boldsymbol{a}_t ~=~ \alpha_t \boldsymbol{s}_{t-1}\] where the shrinkage \(\alpha_t\) is recursively defined by \(\alpha_T = 1/T\) and \(\alpha_t = \alpha_{t+1}^2 + \alpha_{t+1}\). (very roughly \(\alpha_t \approx 1/t\).)

In both cases, we see that the minimax strategy plays a dampened version of the best offline action thus far.

Minimax strategy. Optimal prediction \boldsymbol{a} as a function of the state \boldsymbol{s} after 25 of 50 rounds total. — Minimax strategy. Optimal prediction \(\boldsymbol{a}\) as a function of the state \(\boldsymbol{s}\) after \(25\) of \(50\) rounds total.

Question

We might also consider the loss \(\ell(\boldsymbol{a},\boldsymbol{x}) = - (\boldsymbol{a}^\intercal\boldsymbol{x})^2\), still with \(\boldsymbol{a}\) and \(\boldsymbol{x}\) from the unit ball, which relates to PCA. Is the minimax algorithm tractable? Now the best offline action is the eigenvector of largest eigenvalue of the data second moment matrix \[\max_{\boldsymbol{a}} \sum_t (\boldsymbol{a}^\intercal\boldsymbol{x}_t)^2 ~=~ %\max_{\a} \sum_t \a^\top \x_t \x_t^\top \a %~=~ \max_{\boldsymbol{a}} \boldsymbol{a}^\intercal\left(\sum_t \boldsymbol{x}_t \boldsymbol{x}_t^\intercal\right) \boldsymbol{a} ~=~ \lambda_\text{max} \left(\sum_t \boldsymbol{x}_t \boldsymbol{x}_t^\intercal\right) .\] Rather than quadratic, this problem has a linear feel to it, and that is correct. It is the linear problem in the space of dyads (outer products \(\boldsymbol{a}\boldsymbol{a}^\intercal\) of unit vectors \(\boldsymbol{a}\)) where \((\boldsymbol{a}^\intercal\boldsymbol{x})^2 = \mathop{\mathrm{tr}}\left(\boldsymbol{a}\boldsymbol{a}^\intercal\boldsymbol{x}\boldsymbol{x}^\intercal\right)\) is the natural inner product.

Abernethy, Jacob, Peter L. Bartlett, Alexander Rakhlin, and Ambuj Tewari. 2008. “Optimal Stragies and Minimax Lower Bounds for Online Convex Games.” In 21st Annual Conference on Learning Theory - COLT 2008, Helsinki, Finland, July 9-12, 2008, edited by Rocco A. Servedio and Tong Zhang, 415–24. Omnipress. http://colt2008.cs.helsinki.fi/papers/111-Abernethy.pdf.

Takimoto, Eiji, and Manfred K. Warmuth. 2000. “The Minimax Strategy for Gaussian Density Estimation.” In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT 2000), June 28 - July 1, 2000, Palo Alto, California, edited by Nicolò Cesa-Bianchi and Sally A. Goldman, 100–106. Morgan Kaufmann. https://users.soe.ucsc.edu/~manfred/pubs/C56.pdf.