Convex duality is an important tool in machine learning. Here I present an insightful arrangement of the graphs of a convex function, its dual and their derivatives.

Introduction

Convex optimisation, and in particular convex duality, is an important technique that is very useful in machine learning. We investigate one way to visualise what is going on.

Convex duality

Let \(f : \mathbb R \to \mathbb R\) be a convex function. The convex conjugate or dual \(f^*\) of \(f\) is defined by \begin{equation*} f^*(y) ~:=~ \sup_x x y - f(x) \end{equation*} Under regularity conditions (Legendre), the supremum above is attained where the derivative vanishes, and we find that the maxiser \(x\) must satisfy \(y = \nabla f(x)\). We also have \begin{equation*} \nabla f^* ~=~ (\nabla f)^{-1} \end{equation*}

This identity suggests an arrangements of graphs relating a function to its dual via its derivative that I have not encountered before.

Primal (top left), dual (top right) and their derivatives (middle bottom). The derivative and its inverse convert from primal to dual coordinates and vice versa.

Primal (top left), dual (top right) and their derivatives (middle bottom). The derivative and its inverse convert from primal to dual coordinates and vice versa.

The function plotted above is the following: \begin{align*} f(x) &~=~ -\sin(x) & f'(x) &~=~ -\cos(x) \\ f^*(z) &~=~ \sqrt{1-z^2} + z \operatorname{acos}(-z) & {f^*}'(z) &~=~ \operatorname{acos}(-z) \end{align*}

Bonus material: non-differentiable \(f\)

Convex duality gets a little hairier when the convex function \(f\) is not differentiable, but with appropriate open-mindedness (set-valuedness) the above idea carries over. Here is the correspondence plot for the prototypical non-differentiable function: the absolute value.

Primal (top left), dual (top right) and their derivatives (middle bottom). The derivative and its inverse convert from primal to dual coordinates and vice versa.

Primal (top left), dual (top right) and their derivatives (middle bottom). The derivative and its inverse convert from primal to dual coordinates and vice versa.

And here are the corresponding formulas. \begin{align*} f(x) &~=~ |x| & f'(x) &~=~ \begin{cases} 1 & x>0 \\ [-1,1] & x=0 \\ -1 & x < 0 \end{cases} \\ f^*(z) &~=~ \begin{cases} 0 & -1 \le z \le 1 \\ \infty & \text{o.w.} \end{cases} & {f^*}'(z) &~=~ \begin{cases} - \infty & z < -1 \\ [-\infty,0] & z=-1 \\ 0 & -1 < z < 1 \\ [0,\infty] & z = 1 \\ \infty & z > 1 \end{cases} \end{align*}