2015-09-08
We look at a simple quadratic minimax problem on the Euclidean sphere, and consider what happens when it flips from convex to concave.
This is a sneak preview of a nice result with Malek, Bartlett and Abbasi-Yadkori to be presented in (Koolen et al. 2015). Fix \(\alpha \in \mathbb R\) and \(\boldsymbol{b}\in \mathbb R^d\) such that \(\|\boldsymbol{b}\| \le 1\). Let us consider the following min-max problem \[V_\alpha ~:=~ \min_{\boldsymbol{a}\in \mathbb R^d} \max_{\substack{\boldsymbol{x}\in \mathbb R^d \\ \|\boldsymbol{x}\| \le 1}} ~ \|\boldsymbol{a}-\boldsymbol{x}\|^2 + (\alpha-1) \|\boldsymbol{x}\|^2 + 2 \boldsymbol{b}^\intercal\boldsymbol{x}\] as a function of \(\alpha\). (Here \(\|\boldsymbol{x}\|^2 = \boldsymbol{x}^\intercal\boldsymbol{x}\) denotes the Ecludean norm squared.) In this post we establish that the value and minimiser are \[V_\alpha ~=~ \begin{cases} \frac{\|\boldsymbol{b}\|^2}{1-\alpha} & \text{if $\alpha \le 0$,} \\[.2em] \|\boldsymbol{b}\|^2 + \alpha & \text{if $\alpha \ge 0$,} \end{cases} \qquad \text{and} \qquad \boldsymbol{a} ~=~ \begin{cases} \frac{\boldsymbol{b}}{1-\alpha} & \text{if $\alpha \le 0$,} \\[.2em] \boldsymbol{b}& \text{if $\alpha \ge 0$.} \end{cases}\] The two pieces of \(V_\alpha\) meet at \(\alpha = 0\), where the value is \(\|\boldsymbol{b}\|^2\). The left and right derivatives at \(\alpha = 0\) are \[\|\boldsymbol{b}\|^2 \qquad \text{and} \qquad 1\] and so the function \(V_\alpha\) is differentiable iff \(\|\boldsymbol{b}\|=1\).
Why is this min-max problem cool? Our problem is an easy example of a min-max function that flips (at \(\alpha=0\)) from convex-concave to convex-convex. The former type of functions admit nice minimax analysis. The latter is generally much more complicated to deal with. Yet in our example it is still analytically tractable.
Let us rewrite \begin{equation}\label{eq:value} V_\alpha ~=~ \min_{\boldsymbol{a}\in \mathbb R^d} \max_{\substack{\boldsymbol{x}\in \mathbb R^d \\ \|\boldsymbol{x}\| \le 1}} ~ \|\boldsymbol{a}\|^2 + \alpha \|\boldsymbol{x}\|^2 + 2 (\boldsymbol{b}- \boldsymbol{a})^\intercal\boldsymbol{x} . \end{equation}
The objective is concave in \(\boldsymbol{x}\). The maximum is hence found at zero derivative, i.e. \[\boldsymbol{x} ~=~ \frac{\boldsymbol{a}- \boldsymbol{b}}{\alpha} ,\] (it remains to check this is feasible) where the value equals \[V_\alpha ~=~ \min_{\boldsymbol{a}\in \mathbb R^d} ~ \|\boldsymbol{a}\|^2 - \frac{\|\boldsymbol{a}- \boldsymbol{b}\|^2}{\alpha} .\] As \(\alpha \le 0\) this is convex in \(\boldsymbol{a}\), and hence minimised at vanishing derivative. That is, \[\boldsymbol{a} ~=~ \frac{\boldsymbol{b}}{1-\alpha} ,\] (back substitution reveals \(\boldsymbol{x}= \frac{\boldsymbol{b}}{1-\alpha}\), which is indeed feasible as \(\|\boldsymbol{b}\| \le 1\) by assumption and we are considering the case \(\alpha \le 0\)) where the value equals \[V_\alpha ~=~ \frac{\|\boldsymbol{b}\|^2}{1-\alpha} .\] Note that \(\max_{\boldsymbol{x}} \min_{\boldsymbol{a}}\) results in inner minimiser \(\boldsymbol{a}=\boldsymbol{x}\) and then again in outer maximiser \(\boldsymbol{x}= \frac{\boldsymbol{b}}{1-\alpha}\). Therefore the pair \(\boldsymbol{a}=\boldsymbol{x}=\frac{\boldsymbol{b}}{1-\alpha}\) form a saddle point or Nash equilibrium (are mutually best responses).
The objective is convex in \(\boldsymbol{x}\), and it is hence maximised on the unit ball at the point of the unit sphere that maximises the linear term, i.e. at \[\boldsymbol{x}~=~ \frac{\boldsymbol{b}-\boldsymbol{a}}{\|\boldsymbol{b}-\boldsymbol{a}\|} ,\] Plugging this in, we find \[V_\alpha ~=~ \min_{\boldsymbol{a}\in \mathbb R^d} ~ \|\boldsymbol{a}\|^2 + \alpha + 2 \|\boldsymbol{b}- \boldsymbol{a}\| .\] This is a concave function of \(\boldsymbol{a}\), although it is not differentiable due to the norm. Yet it is minimised where zero is a sub-gradient. In our case this happens at \[\boldsymbol{a} ~=~ \boldsymbol{b},\] where we used the assumption that \(\|\boldsymbol{b}\| \le 1\). We conclude \[V_\alpha ~=~ \|\boldsymbol{b}\|^2 + \alpha .\] Notice that the choice \(\boldsymbol{a}=\boldsymbol{b}\) kills the linear term in \(\boldsymbol{x}\) of the objective \(\eqref{eq:value}\), and hence equalises the objective over the unit sphere.
We found two algorithms, one for \(\alpha \le 0\) and one for \(\alpha \ge 0\). Now suppose we can only play one of them in all cases. How bad would that be?
First suppose we always play \(\boldsymbol{a}=\boldsymbol{b}\). This kills the linear term in \(\boldsymbol{x}\). So the objective is maximised in \(\boldsymbol{x}\) either anywhere on the unit sphere if \(\alpha \ge 0\) or at \(\boldsymbol{x}=0\) if \(\alpha \le 0\). The worst-case value is hence \[\begin{cases} \|\boldsymbol{b}\|^2 & \text{if $\alpha \le 0$,} \\ \|\boldsymbol{b}\|^2 + \alpha & \text{if $\alpha \ge 0$.} \end{cases}\]
Now suppose we always play \(\boldsymbol{a}= \frac{\boldsymbol{b}}{1-\alpha}\). Then \[\max_{\substack{\boldsymbol{x}\in \mathbb R^d \\ \|\boldsymbol{x}\| \le 1}} ~ \frac{\|\boldsymbol{b}\|^2}{(1-\alpha)^2} + \alpha \|\boldsymbol{x}\|^2 - \frac{2 \alpha}{1-\alpha} \boldsymbol{b}^\intercal\boldsymbol{x}\] is maximised at \(\boldsymbol{x}= \frac{ - \frac{\alpha}{1-\alpha} \boldsymbol{b}}{\| - \frac{\alpha}{1-\alpha} \boldsymbol{b}\|}\) if \(\alpha \ge 0\) where it takes value \[\frac{\|\boldsymbol{b}\|^2}{(1-\alpha)^2} + \alpha + 2 \frac{\alpha}{|1-\alpha|} \|\boldsymbol{b}\|\] or at \(\boldsymbol{x}=\frac{\boldsymbol{b}}{1-\alpha}\) if \(\alpha \le 0\) where it takes value \[\frac{\|\boldsymbol{b}\|^2}{1-\alpha}\] as before.
We now look at what happens if we play an algorithm designed to operate against data satisfying the constraint \(\|\boldsymbol{x}\| \le 1\), and encounter data violating that constraint.
Say \(\alpha \le0\) and we play \(\boldsymbol{a}= \frac{\boldsymbol{b}}{1-\alpha}\). Then the objective is \[%\frac{\norm{\b}^2}{(1-\alpha)^2} %+ \alpha \norm{\x}^2 %- 2 \frac{\alpha}{1-\alpha} \b^\top \x %~=~ \frac{\|\boldsymbol{b}\|^2 }{1-\alpha} + \alpha \Big\|\frac{\boldsymbol{b}}{1-\alpha} - \boldsymbol{x}\Big\|^2 ~\le~ \frac{\|\boldsymbol{b}\|^2 }{1-\alpha}\] regardless of \(\boldsymbol{x}\).
Say \(\alpha \ge 0\) and we play \(\boldsymbol{a}=\boldsymbol{b}\). Then the objective is \[\|\boldsymbol{b}\|^2 + \alpha \|\boldsymbol{x}\|^2\] which scales nicely with the actual data norm.
So in both cases we see that we do not need to assume the norm bound on the data. The convex-concave worst case is independent of any norm bound. The convex-convex case adapts to the norm bound without needing to know it.