We look at a simple quadratic minimax problem on the Euclidean sphere, and consider what happens when it flips from convex to concave.

Introduction

This is a sneak preview of a nice result with Malek, Bartlett and Abbasi-Yadkori to be presented in Koolen et al. (2015). Fix \(\alpha \in {\mathbb R}\) and \({\boldsymbol{b}}\in {\mathbb R}^d\) such that \({\|{\boldsymbol{b}}\|} \le 1\). Let us consider the following min-max problem \[V_\alpha ~:=~ \min_{{\boldsymbol{a}}\in {\mathbb R}^d} \max_{\substack{{\boldsymbol{x}}\in {\mathbb R}^d \\ {\|{\boldsymbol{x}}\|} \le 1}} ~ {\|{\boldsymbol{a}}-{\boldsymbol{x}}\|}^2 + (\alpha-1) {\|{\boldsymbol{x}}\|}^2 + 2 {\boldsymbol{b}}^{\intercal}{\boldsymbol{x}}\] as a function of \(\alpha\). (Here \({\|{\boldsymbol{x}}\|}^2 = {\boldsymbol{x}}^{\intercal}{\boldsymbol{x}}\) denotes the Ecludean norm squared.) In this post we establish that the value and minimiser are \[V_\alpha ~=~ \begin{cases} \frac{{\|{\boldsymbol{b}}\|}^2}{1-\alpha} & \text{if $\alpha \le 0$,} \\[.2em] {\|{\boldsymbol{b}}\|}^2 + \alpha & \text{if $\alpha \ge 0$,} \end{cases} \qquad \text{and} \qquad {\boldsymbol{a}}~=~ \begin{cases} \frac{{\boldsymbol{b}}}{1-\alpha} & \text{if $\alpha \le 0$,} \\[.2em] {\boldsymbol{b}}& \text{if $\alpha \ge 0$.} \end{cases}\] The two pieces of \(V_\alpha\) meet at \(\alpha = 0\), where the value is \({\|{\boldsymbol{b}}\|}^2\). The left and right derivatives at \(\alpha = 0\) are \[{\|{\boldsymbol{b}}\|}^2 \qquad \text{and} \qquad 1\] and so the function \(V_\alpha\) is differentiable iff \({\|{\boldsymbol{b}}\|}=1\).

Why is this min-max problem cool? Our problem is an easy example of a min-max function that flips (at \(\alpha=0\)) from convex-concave to convex-convex. The former type of functions admit nice minimax analysis. The latter is generally much more complicated to deal with. Yet in our example it is still analytically tractable.

Graph of V_\alpha

Graph of \(V_\alpha\)

Analysis

Let us rewrite \begin{equation}\label{eq:value} V_\alpha ~=~ \min_{{\boldsymbol{a}}\in {\mathbb R}^d} \max_{\substack{{\boldsymbol{x}}\in {\mathbb R}^d \\ {\|{\boldsymbol{x}}\|} \le 1}} ~ {\|{\boldsymbol{a}}\|}^2 + \alpha {\|{\boldsymbol{x}}\|}^2 + 2 ({\boldsymbol{b}}- {\boldsymbol{a}})^{\intercal}{\boldsymbol{x}}. \end{equation}

The convex-concave case \(\alpha \le 0\)

The objective is concave in \({\boldsymbol{x}}\). The maximum is hence found at zero derivative, i.e.\[{\boldsymbol{x}}~=~ \frac{{\boldsymbol{a}}- {\boldsymbol{b}}}{\alpha} ,\] (it remains to check this is feasible) where the value equals \[V_\alpha ~=~ \min_{{\boldsymbol{a}}\in {\mathbb R}^d} ~ {\|{\boldsymbol{a}}\|}^2 - \frac{{\|{\boldsymbol{a}}- {\boldsymbol{b}}\|}^2}{\alpha} .\] As \(\alpha \le 0\) this is convex in \({\boldsymbol{a}}\), and hence minimised at vanishing derivative. That is, \[{\boldsymbol{a}}~=~ \frac{{\boldsymbol{b}}}{1-\alpha} ,\] (back substitution reveals \({\boldsymbol{x}}= \frac{{\boldsymbol{b}}}{1-\alpha}\), which is indeed feasible as \({\|{\boldsymbol{b}}\|} \le 1\) by assumption and we are considering the case \(\alpha \le 0\)) where the value equals \[V_\alpha ~=~ \frac{{\|{\boldsymbol{b}}\|}^2}{1-\alpha} .\] Note that \(\max_{{\boldsymbol{x}}} \min_{{\boldsymbol{a}}}\) results in inner minimiser \({\boldsymbol{a}}={\boldsymbol{x}}\) and then again in outer maximiser \({\boldsymbol{x}}= \frac{{\boldsymbol{b}}}{1-\alpha}\). Therefore the pair \({\boldsymbol{a}}={\boldsymbol{x}}=\frac{{\boldsymbol{b}}}{1-\alpha}\) form a saddle point or Nash equilibrium (are mutually best responses).

The convex-convex case \(\alpha \ge 0\)

The objective is convex in \({\boldsymbol{x}}\), and it is hence maximised on the unit ball at the point of the unit sphere that maximises the linear term, i.e. at \[{\boldsymbol{x}}~=~ \frac{{\boldsymbol{b}}-{\boldsymbol{a}}}{{\|{\boldsymbol{b}}-{\boldsymbol{a}}\|}} ,\] Plugging this in, we find \[V_\alpha ~=~ \min_{{\boldsymbol{a}}\in {\mathbb R}^d} ~ {\|{\boldsymbol{a}}\|}^2 + \alpha + 2 {\|{\boldsymbol{b}}- {\boldsymbol{a}}\|} .\] This is a concave function of \({\boldsymbol{a}}\), although it is not differentiable due to the norm. Yet it is minimised where zero is a sub-gradient. In our case this happens at \[{\boldsymbol{a}}~=~ {\boldsymbol{b}},\] where we used the assumption that \({\|{\boldsymbol{b}}\|} \le 1\). We conclude \[V_\alpha ~=~ {\|{\boldsymbol{b}}\|}^2 + \alpha .\] Notice that the choice \({\boldsymbol{a}}={\boldsymbol{b}}\) kills the linear term in \({\boldsymbol{x}}\) of the objective \(\eqref{eq:value}\), and hence equalises the objective over the unit sphere.

Using a single algorithm throughout

We found two algorithms, one for \(\alpha \le 0\) and one for \(\alpha \ge 0\). Now suppose we can only play one of them in all cases. How bad would that be?

First suppose we always play \({\boldsymbol{a}}={\boldsymbol{b}}\). This kills the linear term in \({\boldsymbol{x}}\). So the objective is maximised in \({\boldsymbol{x}}\) either anywhere on the unit sphere if \(\alpha \ge 0\) or at \({\boldsymbol{x}}=0\) if \(\alpha \le 0\). The worst-case value is hence \[\begin{cases} {\|{\boldsymbol{b}}\|}^2 & \text{if $\alpha \le 0$,} \\ {\|{\boldsymbol{b}}\|}^2 + \alpha & \text{if $\alpha \ge 0$.} \end{cases}\]

Now suppose we always play \({\boldsymbol{a}}= \frac{{\boldsymbol{b}}}{1-\alpha}\). Then \[\max_{\substack{{\boldsymbol{x}}\in {\mathbb R}^d \\ {\|{\boldsymbol{x}}\|} \le 1}} ~ \frac{{\|{\boldsymbol{b}}\|}^2}{(1-\alpha)^2} + \alpha {\|{\boldsymbol{x}}\|}^2 - \frac{2 \alpha}{1-\alpha} {\boldsymbol{b}}^{\intercal}{\boldsymbol{x}}\] is maximised at \({\boldsymbol{x}}= \frac{ - \frac{\alpha}{1-\alpha} {\boldsymbol{b}}}{{\| - \frac{\alpha}{1-\alpha} {\boldsymbol{b}}\|}}\) if \(\alpha \ge 0\) where it takes value \[\frac{{\|{\boldsymbol{b}}\|}^2}{(1-\alpha)^2} + \alpha + 2 \frac{\alpha}{{|1-\alpha|}} {\|{\boldsymbol{b}}\|}\] or at \({\boldsymbol{x}}=\frac{{\boldsymbol{b}}}{1-\alpha}\) if \(\alpha \le 0\) where it takes value \[\frac{{\|{\boldsymbol{b}}\|}^2}{1-\alpha}\] as before.

Graph of value for sub-optimal algorithms.

Graph of value for sub-optimal algorithms.

Violated norm bound

We now look at what happens if we play an algorithm designed to operate against data satisfying the constraint \({\|{\boldsymbol{x}}\|} \le 1\), and encounter data violating that constraint.

Convex-concave

Say \(\alpha \le0\) and we play \({\boldsymbol{a}}= \frac{{\boldsymbol{b}}}{1-\alpha}\). Then the objective is \[\frac{{\|{\boldsymbol{b}}\|}^2 }{1-\alpha} + \alpha {\Big\|\frac{{\boldsymbol{b}}}{1-\alpha} - {\boldsymbol{x}}\Big\|}^2 ~\le~ \frac{{\|{\boldsymbol{b}}\|}^2 }{1-\alpha}\] regardless of \({\boldsymbol{x}}\).

Convex-convex

Say \(\alpha \ge 0\) and we play \({\boldsymbol{a}}={\boldsymbol{b}}\). Then the objective is \[{\|{\boldsymbol{b}}\|}^2 + \alpha {\|{\boldsymbol{x}}\|}^2\] which scales nicely with the actual data norm.

So in both cases we see that we do not need to assume the norm bound on the data. The convex-concave worst case is independent of any norm bound. The convex-convex case adapts to the norm bound without needing to know it.

Koolen, Wouter M., Alan Malek, Peter L. Bartlett, and Yasin Abbasi-Yadkori. 2015. “Minimax Time Series Prediction.” In Advances in Neural Information Processing Systems (NIPS) 28, to appear.