# Tight Diagonal Upper Bounds on Dyads

2015-11-19

Abstract

We look at bounding a symmetric rank one matrix from above by a diagonal matrix.

# Introduction

The topic of this post came up during a discussion with Tim van Erven. Sometimes it is important to bound a matrix form above by a diagonal matrix. For example, if the original matrix describes covariance structure between features then a diagonal matrix corresponds to independence. In this post we obtain a diagonal upper bound for a symmetric rank one matrix $$\boldsymbol{\ell}\boldsymbol{\ell}^\intercal$$ (a matrix of this form is sometimes called a dyad). So we are looking for a diagonal matrix $$\boldsymbol{D}$$ such that $\boldsymbol{D} ~\succeq~ \boldsymbol{\ell}\boldsymbol{\ell}^\intercal$ where $$\succeq$$ denotes the Löwner order ($$\boldsymbol{A}\succeq \boldsymbol{B}$$ if $$\boldsymbol{A}- \boldsymbol{B}$$ is positive semi-definite). Now there are many matrices $$\boldsymbol{D}$$ satisfying the above requirement. We want to find the best one. To define optimality we need to put forward a scalar-valued objective. Here we consider minimising the trace of $$\boldsymbol{D}$$. So we are looking for $V ~:=~ \min_{\substack{ \boldsymbol{D}\succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal, \\ \text{\boldsymbol{D} diagonal} }}~ \mathop{\mathrm{tr}}(\boldsymbol{D}) .$ As a first step, let us reparametrise by $$\boldsymbol{D}= \mathop{\mathrm{diag}}(\boldsymbol{d})$$. Then we are looking for $V ~=~ \min_{ \mathop{\mathrm{diag}}(\boldsymbol{d}) \succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal }~ \sum_i d_i .$ Now how to deal with the matrix inequality? For $$\boldsymbol{D}\succeq \boldsymbol{0}$$ we have $$\boldsymbol{D}\succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal$$ iff $$\boldsymbol{\ell}^\intercal\boldsymbol{D}^{-1} \boldsymbol{\ell}\le 1$$. This follows from reasoning about the Schur complements of the matrix $$\begin{bmatrix} \boldsymbol{D}& \boldsymbol{\ell}\\ \boldsymbol{\ell}^\intercal& 1 \end{bmatrix}$$. Furthermore, inverting a diagonal matrix simply inverts its diagonal elements. So we need to solve $V ~=~ \min_{ \boldsymbol{d}\ge 0 : \sum_i l_i^2/d_i \le 1 %\l^\top \diag(\d^{-1}) \l \le 1 }~ \sum_i d_i .$ Introducing a Lagrange multiplier $$\mu$$ for the constraint, we find $V ~=~ \max_{\mu \ge 0} \min_{ \boldsymbol{d}\ge 0 %\l^\top \diag(\d^{-1}) \l \le 1 }~ \sum_i d_i + \mu \left( \sum_i l_i^2/d_i - 1 \right) .$ We solve the innermost optimisation by setting the $$d_i$$-derivative to zero. We find $0 ~=~ 1 - \mu l_i^2/d_i^2 \qquad \text{and hence} \qquad d_i ~=~ \sqrt{\mu} |l_i|$ Substituting this back in, it hence remains to solve $V ~=~ \max_{\mu \ge 0}~ \sum_i \sqrt{\mu} |l_i| + \mu \left( \sum_i \frac{l_i^2}{\sqrt{\mu} |l_i|} - 1 \right) ~=~ \max_{\mu \ge 0}~ 2 \sqrt{\mu} \sum_i |l_i| - \mu .$ Optimising $$\mu$$ by equating its derivative to zero results in $0 ~=~ \frac{1}{\sqrt{\mu}} \sum_i |l_i| - 1 \qquad \text{so that} \qquad \mu ~=~ \Big( \sum_i |l_i| \Big)^2 .$ We hence find optimiser and optimal value $d_i ~=~ |l_i| \sum_j |l_j| \qquad \text{and} \qquad V ~=~ \Big(\sum_i |l_i|\Big)^2 .$ Okay, that is completely explicit. So how far does this method stretch?

## Schatten $$p$$-norm Objectives

The previous result is specific to the scalar objective chosen. Yet the method generalises from the trace to any Schatten $$p$$-norm as follows. Define $V ~:=~ \min_{\substack{ \boldsymbol{D}\succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal, \\ \text{\boldsymbol{D} diagonal} }}~ \|\boldsymbol{D}\|_p$ where $$\|\boldsymbol{D}\|_p$$ denotes the Schatten $$p$$-norm, that is, the vector $$p$$-norm $$\|\boldsymbol{\ell}\|_p = (\sum_i |l_i|^p)^{1/p}$$ applied to the eigenvalues of the matrix $$\boldsymbol{D}$$. We raise the objective to the power $$p$$ for convenience and rewrite as before $V^p ~=~ \min_{ \boldsymbol{d}\ge 0 : \sum_i l_i^2/d_i \le 1 }~ \sum_i d_i^p .$ Introducing Lagrange multiplier $$\mu$$ for the constraint, we find $V^p ~=~ \max_{\mu \ge 0} \min_{ \boldsymbol{d}\ge 0 }~ \sum_i d_i^p + \mu \left(\sum_i l_i^2/d_i - 1\right) .$ Setting the derivative to $$0$$ results in $d_i ~=~ \left( \frac{ \mu l_i^2 }{ p } \right)^{\frac{1}{p+1}}$ so that $V^p ~=~ \max_{\mu \ge 0}~ (p^{-\frac{p}{p+1}} + p^{\frac{1}{p+1}}) \mu^{\frac{p}{p+1}} \sum_i l_i^{\frac{2 p}{p+1}} - \mu .$ Now optimising for $$\mu$$ results in $\mu ~=~ p \bigg(\sum_i l_i^{\frac{2 p}{p+1}}\bigg)^{p+1}$ so that we have optimiser and value $d_i ~=~ l_i^{\frac{2}{p+1}} \sum_j l_j^{\frac{2 p}{p+1}} \qquad \text{and} \qquad V ~=~ \bigg( \sum_j l_j^{\frac{2 p}{p+1}} \bigg)^{\frac{p+1}{p}} .$

#### Examples

The case $$p=1$$ is called the trace norm or nuclear norm, where we find $d_i ~=~ |l_i| \sum_j |l_j| \qquad \text{and} \qquad V ~=~ \Big(\sum_j |l_j|\Big)^{2} .$ The case $$p=2$$ is called the Frobenius norm where we find $d_i ~=~ l_i^{\frac{2}{3}} \sum_j l_j^{\frac{4}{3}} \qquad \text{and} \qquad V ~=~ \bigg(\sum_j l_j^{\frac{4}{3}}\bigg)^{3/2} .$ The case $$p=\infty$$ (maximum eigenvalue) is called the spectral norm where we find $d_i ~=~ \sum_j l_j^2 \qquad \text{and} \qquad V ~=~ \sum_j l_j^2 .$ Note that in this last case the $$d_i$$ are all the same, so the optimal matrix is a multiple of the identity matrix.

A case of interest is also $$p=0$$. The above analysis probably does not work (as $$p$$-norms switch from concave to convex at $$p=1$$), but we can still show in a different way that the final formula gives the correct answer, i.e. we find $d_i ~=~ n l_i^2 \qquad \text{and} \qquad V ~=~ n .$