Bimatrix Games

Wouter M. Koolen



We consider two-player zero-sum matrix games. Such games have a single round in which the two players independently choose a strategy. If player one chooses \(i\) and player two chooses \(j\) then the loss of player one and the gain of player two are given by the entry \(A_{i,j}\) of the so-called payoff matrix \(\boldsymbol{A}\) that completely specifies the game.

A matrix game has a saddle point in mixed strategies, where both players randomise. We investigate how many strategies are used in optimal play for each player. This is interesting e.g. if \(\boldsymbol{A}\) is not square. It turns out that there always is a saddle point where both players randomise over the same number of strategies.


Fix a matrix \(\boldsymbol{A}\), not necessarily square. We consider the problem \[\min_{\boldsymbol{p}} \max_{\boldsymbol{q}} \, \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q}\] where \(\boldsymbol{p}\) and \(\boldsymbol{q}\) range over the simplex. We first rewrite it as \[\min_{\substack{ \boldsymbol{p}, \lambda \\ \forall j: \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j \le \lambda }} \lambda\] By the KKT conditions, there are primal variables \((\boldsymbol{p}, \lambda)\) and dual variables \((\mu, \boldsymbol{\nu}, \boldsymbol{q})\) such that \begin{align*} p_i &~\ge~ 0 \\ \sum_i p_i &~=~ 1 \\ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j &~\le~ \lambda \\ q_j &~\ge~ 0 \\ \nu_i &~\ge~ 0 \\ - \nu_i p_i &~=~ 0 \\ q_j (\boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j - \lambda) &~=~ 0 \\ - \boldsymbol{\nu}+ \boldsymbol{A}\boldsymbol{q}- \mu \boldsymbol{1}&~=~ 0 \\ 1 - \sum_j q_j &~=~ 0 \end{align*} Eliminating \(\boldsymbol{\nu}= \boldsymbol{A}\boldsymbol{q}- \mu \boldsymbol{1}\) and reorganising, we get \begin{align*} p_i &~\ge~ 0 \\ \sum_i p_i &~=~ 1 \\ q_j &~\ge~ 0 \\ \sum_j q_j &~=~ 1 \\ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j &~\le~ \lambda \\ \boldsymbol{e}_i^\intercal\boldsymbol{A}\boldsymbol{q}&~\ge~ \mu \\ p_i (\boldsymbol{e}_i^\intercal\boldsymbol{A}\boldsymbol{q}- \mu) &~=~ 0 \\ q_j (\boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j - \lambda) &~=~ 0 \end{align*} Summing the last two equations over \(i\) and \(j\) we find \[\mu ~=~ \lambda ~=~ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q}\] So that \(\boldsymbol{p}\) and \(\boldsymbol{q}\) form a saddle point iff \begin{align*} p_i &~\ge~ 0 \\ \sum_i p_i &~=~ 1 \\ q_j &~\ge~ 0 \\ \sum_j q_j &~=~ 1 \\ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j &~\le~ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q} \\ \boldsymbol{e}_i^\intercal\boldsymbol{A}\boldsymbol{q}&~\ge~ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q} \\ p_i (\boldsymbol{e}_i^\intercal\boldsymbol{A}\boldsymbol{q}- \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q}) &~=~ 0 \\ q_j (\boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j - \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q}) &~=~ 0 \end{align*} Now what does this say about the support size of \(\boldsymbol{p}\) and \(\boldsymbol{q}\)?

Say the support of \(\boldsymbol{q}\) is larger than that of \(\boldsymbol{p}\), and say that \(\boldsymbol{p}\) and \(\boldsymbol{q}\) are both strictly positive (otherwise trim \(\boldsymbol{A}\)). So we must have for all \(i,j\) that \begin{align*} \boldsymbol{e}_i^\intercal\boldsymbol{A}\boldsymbol{q}&~=~ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q} \\ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{e}_j &~=~ \boldsymbol{p}^\intercal\boldsymbol{A}\boldsymbol{q} \end{align*} If we can find a \(\boldsymbol{v}\neq 0\) such that \(\sum_i v_i = 0\) and \(\boldsymbol{e}_i^\intercal\boldsymbol{A}\boldsymbol{v}= 0\) for all \(i\) then \(\boldsymbol{q}+ \epsilon \boldsymbol{v}\) also satisfies the conditions, and we can move through solution space until we hit a new zero. Now when \(\boldsymbol{A}\) has full row rank there are \(d\) equations. So we have a non-zero solution for \(d+1\) variables. If \(\boldsymbol{A}\) is not full row rank then there are at most \(d\) equations, and again from \(d+1\) variables we have redundancy.

It follows that there is a saddle point with identical support sizes.