Tight Diagonal Upper Bounds on Dyads

Wouter M. Koolen

2015-11-19

Abstract

We look at bounding a symmetric rank one matrix from above by a diagonal matrix.

Introduction

The topic of this post came up during a discussion with Tim van Erven. Sometimes it is important to bound a matrix form above by a diagonal matrix. For example, if the original matrix describes covariance structure between features then a diagonal matrix corresponds to independence. In this post we obtain a diagonal upper bound for a symmetric rank one matrix $\boldsymbol{\ell}\boldsymbol{\ell}^\intercal$ (a matrix of this form is sometimes called a dyad). So we are looking for a diagonal matrix $\boldsymbol{D}$ such that \[\boldsymbol{D} ~\succeq~ \boldsymbol{\ell}\boldsymbol{\ell}^\intercal\] where $\succeq$ denotes the Löwner order ($\boldsymbol{A}\succeq \boldsymbol{B}$ if $\boldsymbol{A}- \boldsymbol{B}$ is positive semi-definite). Now there are many matrices $\boldsymbol{D}$ satisfying the above requirement. We want to find the best one. To define optimality we need to put forward a scalar-valued objective. Here we consider minimising the trace of $\boldsymbol{D}$. So we are looking for \[V ~:=~ \min_{\substack{ \boldsymbol{D}\succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal, \\ \text{$\boldsymbol{D}$ diagonal} }}~ \mathop{\mathrm{tr}}(\boldsymbol{D}) .\] As a first step, let us reparametrise by $\boldsymbol{D}= \mathop{\mathrm{diag}}(\boldsymbol{d})$. Then we are looking for \[V ~=~ \min_{ \mathop{\mathrm{diag}}(\boldsymbol{d}) \succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal }~ \sum_i d_i .\] Now how to deal with the matrix inequality? For $\boldsymbol{D}\succeq \boldsymbol{0}$ we have $\boldsymbol{D}\succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal$ iff $\boldsymbol{\ell}^\intercal\boldsymbol{D}^{-1} \boldsymbol{\ell}\le 1$. This follows from reasoning about the Schur complements of the matrix $\begin{bmatrix} \boldsymbol{D}& \boldsymbol{\ell}\\ \boldsymbol{\ell}^\intercal& 1 \end{bmatrix}$. Furthermore, inverting a diagonal matrix simply inverts its diagonal elements. So we need to solve \[V ~=~ \min_{ \boldsymbol{d}\ge 0 : \sum_i l_i^2/d_i \le 1 %\l^\top \diag(\d^{-1}) \l \le 1 }~ \sum_i d_i .\] Introducing a Lagrange multiplier $\mu$ for the constraint, we find \[V ~=~ \max_{\mu \ge 0} \min_{ \boldsymbol{d}\ge 0 %\l^\top \diag(\d^{-1}) \l \le 1 }~ \sum_i d_i + \mu \left( \sum_i l_i^2/d_i - 1 \right) .\] We solve the innermost optimisation by setting the $d_i$-derivative to zero. We find \[0 ~=~ 1 - \mu l_i^2/d_i^2 \qquad \text{and hence} \qquad d_i ~=~ \sqrt{\mu} |l_i|\] Substituting this back in, it hence remains to solve \[V ~=~ \max_{\mu \ge 0}~ \sum_i \sqrt{\mu} |l_i| + \mu \left( \sum_i \frac{l_i^2}{\sqrt{\mu} |l_i|} - 1 \right) ~=~ \max_{\mu \ge 0}~ 2 \sqrt{\mu} \sum_i |l_i| - \mu .\] Optimising $\mu$ by equating its derivative to zero results in \[0 ~=~ \frac{1}{\sqrt{\mu}} \sum_i |l_i| - 1 \qquad \text{so that} \qquad \mu ~=~ \Big( \sum_i |l_i| \Big)^2 .\] We hence find optimiser and optimal value \[d_i ~=~ |l_i| \sum_j |l_j| \qquad \text{and} \qquad V ~=~ \Big(\sum_i |l_i|\Big)^2 .\] Okay, that is completely explicit. So how far does this method stretch?

Schatten $p$-norm Objectives

The previous result is specific to the scalar objective chosen. Yet the method generalises from the trace to any Schatten $p$-norm as follows. Define \[V ~:=~ \min_{\substack{ \boldsymbol{D}\succeq \boldsymbol{\ell}\boldsymbol{\ell}^\intercal, \\ \text{$\boldsymbol{D}$ diagonal} }}~ \|\boldsymbol{D}\|_p\] where $\|\boldsymbol{D}\|_p$ denotes the Schatten $p$-norm, that is, the vector $p$-norm $\|\boldsymbol{\ell}\|_p = (\sum_i |l_i|^p)^{1/p}$ applied to the eigenvalues of the matrix $\boldsymbol{D}$. We raise the objective to the power $p$ for convenience and rewrite as before \[V^p ~=~ \min_{ \boldsymbol{d}\ge 0 : \sum_i l_i^2/d_i \le 1 }~ \sum_i d_i^p .\] Introducing Lagrange multiplier $\mu$ for the constraint, we find \[V^p ~=~ \max_{\mu \ge 0} \min_{ \boldsymbol{d}\ge 0 }~ \sum_i d_i^p + \mu \left(\sum_i l_i^2/d_i - 1\right) .\] Setting the derivative to $0$ results in \[d_i ~=~ \left( \frac{ \mu l_i^2 }{ p } \right)^{\frac{1}{p+1}}\] so that \[V^p ~=~ \max_{\mu \ge 0}~ (p^{-\frac{p}{p+1}} + p^{\frac{1}{p+1}}) \mu^{\frac{p}{p+1}} \sum_i l_i^{\frac{2 p}{p+1}} - \mu .\] Now optimising for $\mu$ results in \[\mu ~=~ p \bigg(\sum_i l_i^{\frac{2 p}{p+1}}\bigg)^{p+1}\] so that we have optimiser and value \[d_i ~=~ l_i^{\frac{2}{p+1}} \sum_j l_j^{\frac{2 p}{p+1}} \qquad \text{and} \qquad V ~=~ \bigg( \sum_j l_j^{\frac{2 p}{p+1}} \bigg)^{\frac{p+1}{p}} .\]

Examples

The case $p=1$ is called the trace norm or nuclear norm, where we find \[d_i ~=~ |l_i| \sum_j |l_j| \qquad \text{and} \qquad V ~=~ \Big(\sum_j |l_j|\Big)^{2} .\] The case $p=2$ is called the Frobenius norm where we find \[d_i ~=~ l_i^{\frac{2}{3}} \sum_j l_j^{\frac{4}{3}} \qquad \text{and} \qquad V ~=~ \bigg(\sum_j l_j^{\frac{4}{3}}\bigg)^{3/2} .\] The case $p=\infty$ (maximum eigenvalue) is called the spectral norm where we find \[d_i ~=~ \sum_j l_j^2 \qquad \text{and} \qquad V ~=~ \sum_j l_j^2 .\] Note that in this last case the $d_i$ are all the same, so the optimal matrix is a multiple of the identity matrix.

A case of interest is also $p=0$. The above analysis probably does not work (as $p$-norms switch from concave to convex at $p=1$), but we can still show in a different way that the final formula gives the correct answer, i.e. we find \[d_i ~=~ n l_i^2 \qquad \text{and} \qquad V ~=~ n .\]

Introduction

Schatten \(p\)-norm Objectives

Examples