The Density of a Random Variable

Wouter M. Koolen

2025-01-26

Abstract

We consider the rule for obtaining the density of a function of a random variable from the density of that random variable.

Introduction

Fix a probability measure \(Q\), and consider another probability measure \(P\) defined by density \(g\), i.e. \[d P ~=~ g \, d Q.\] Now consider a random variable \(Y\). We are interested in the density of \(Y\) w.r.t. \(Q\). So we are curious if there is some \(f\) such that \[d P_Y ~=~ f \, d Q_Y.\] The observation of this post is that the following rule works: \begin{equation}\label{eq:rule} f ~=~ \operatorname*{\mathbb E}_Q[ g | Y] \end{equation} In this post we test drive this rule on some evil choices of \(Y\). What if \(Y\) is many-to-one? Contracting to a point? Fractal? But first, let’s prove the rule.

Proof

Fix any event \(B\) in \(Y\) space. Then we need to show \[P_Y(B) ~=~ \int_B \operatorname*{\mathbb E}_Q[g|Y] d Q_Y .\] Let’s start on the right. Then \[\int_B \operatorname*{\mathbb E}_Q[g|Y] d Q_Y ~=~ \int \mathbf 1_B \operatorname*{\mathbb E}_Q[g|Y] d Q ~=~ \operatorname*{\mathbb E}_Q \left[ \operatorname*{\mathbb E}_Q[\mathbf 1_B g|Y]\right] ~=~ \operatorname*{\mathbb E}_Q[\mathbf 1_B g] ~=~ \operatorname*{\mathbb E}_P[\mathbf 1_B] ~=~ P(B) ~=~ P_Y(B) .\] The steps are as follows: changing \(Q_Y\) into \(Q\). Moving the indicator, which is constant given \(Y\), into the inner expectation. Then tower rule. Then \(g\) being the density of \(P\) w.r.t. \(Q\). Then we rewrite to \(P\) and finally \(P_Y\). All bonafide.

Examples

We now illustrate the rule \(\eqref{eq:rule}\) for some choices for the measures \(P\), \(Q\) and the random variable \(Y\). We will consider several candidate \(Y\) below, and fix \(P\) and \(Q\) throughout as follows (we sometimes write \(P_X\) for the original measure to stress the contrast with the measure \(P_Y\) induced by \(Y\)).

image image image

(left) CDF of \(P\) and \(Q\). (middle) Lebesgue densities of \(P\) and \(Q\). (right) density/likelihood ratio/Randon-Nikodym derivative of \(P\) w.r.t. \(Q\).

\(P\) and \(Q\) have densities, as shown in the middle plot. \(P\) is also absolutely continuous w.r.t. \(Q\), and hence has a density/likelihood ratio/Radon-Nikodym derivative. Note that \(P\) has a region around \(X=1/2\) where it assigns probability zero while \(Q\) does not, but that is allowed. The reverse would not be.

Fun

Here we take as our random variable \(Y\) the function given by this plot:

Specification of \(Y\)

Here \(Y\) maps \([0,1]\) to \([-1/10, 1/2]\). Moreover, \(Y\) is not simple: it is many-to-one. Either two inputs or a whole input interval are mapped to the same output value. The resulting situation in terms of CDFs, PDFs, and densities is shown in the below graphs:

image image image

(left) CDF of \(P_Y\) and \(Q_Y\). (middle) Lebesgue densities of \(P_Y\) and \(Q_Y\). (right) density/likelihood ratio/Randon-Nikodym derivative of \(P_Y\) w.r.t. \(Q_Y\).

What we see is that \(P_Y\) and \(Q_Y\) do not have Lebesgue densities. The reason is that they have point-masses at the values of \(Y\) where the graph of \(Y\) is flat. This is visible as the vertical spikes in the middle PDF graph. But, and this is the point of this post, these problems cancel in the relative density aka Radon-Nikodym derivative. This is visible in the rightmost plot, where we see that the relative density is fine. The density is not continuous, but the value at each level where \(Y\) is flat is well-defined.

Cantor

Just to spice things up, here we do the same exercise but with \(Y\) the Cantor function. That is, our random variable \(Y\) is given by this plot:

Specification of \(Y\)

The Cantor function is continuous and almost everywhere flat.

The resulting situation is shown below

image image image

(left) CDF of \(P_Y\) and \(Q_Y\). (middle) Lebesgue densities of \(P_Y\) and \(Q_Y\). (right) density/likelihood ratio/Randon-Nikodym derivative of \(P_Y\) w.r.t. \(Q_Y\).

Again, the PDFs of \(P_Y\) and \(Q_Y\) go crazy. They should, as both are a countable mixture of point masses. So in the PDF plot we are seeing spikes (which are real) with heights (which are numerical artefacts). The crux of this post is that the relative density is, again, perfectly fine.

Finally, let’s look side-by-side at the relative density of \(P_X/Q_X\) and \(P_Y/Q_Y\):

image image

Density of \(P\) w.r.t. \(Q\) for both the input variable \(X\) and for the Cantor function \(Y\). When squinting hard, these look similar.

This plot nicely illustrates the averaging rule \[\frac{d P_Y}{d Q_Y} ~=~ \operatorname*{\mathbb E}\left[ \frac{d P_X}{d Q_X} \middle| Y \right]\] As such, the density is only defined at values of \(Y\) that occur. In this case, that is the dyadic rationals. All in all, extremely intriguing!