2025-01-26
We consider the rule for obtaining the density of a function of a random variable from the density of that random variable.
Fix a probability measure \(Q\), and consider another probability measure \(P\) defined by density \(g\), i.e. \[d P ~=~ g \, d Q.\] Now consider a random variable \(Y\). We are interested in the density of \(Y\) w.r.t. \(Q\). So we are curious if there is some \(f\) such that \[d P_Y ~=~ f \, d Q_Y.\] The observation of this post is that the following rule works: \begin{equation}\label{eq:rule} f ~=~ \operatorname*{\mathbb E}_Q[ g | Y] \end{equation} In this post we test drive this rule on some evil choices of \(Y\). What if \(Y\) is many-to-one? Contracting to a point? Fractal? But first, let’s prove the rule.
Fix any event \(B\) in \(Y\) space. Then we need to show \[P_Y(B) ~=~ \int_B \operatorname*{\mathbb E}_Q[g|Y] d Q_Y .\] Let’s start on the right. Then \[\int_B \operatorname*{\mathbb E}_Q[g|Y] d Q_Y ~=~ \int \mathbf 1_B \operatorname*{\mathbb E}_Q[g|Y] d Q ~=~ \operatorname*{\mathbb E}_Q \left[ \operatorname*{\mathbb E}_Q[\mathbf 1_B g|Y]\right] ~=~ \operatorname*{\mathbb E}_Q[\mathbf 1_B g] ~=~ \operatorname*{\mathbb E}_P[\mathbf 1_B] ~=~ P(B) ~=~ P_Y(B) .\] The steps are as follows: changing \(Q_Y\) into \(Q\). Moving the indicator, which is constant given \(Y\), into the inner expectation. Then tower rule. Then \(g\) being the density of \(P\) w.r.t. \(Q\). Then we rewrite to \(P\) and finally \(P_Y\). All bonafide.
We now illustrate the rule \(\eqref{eq:rule}\) for some choices for the measures \(P\), \(Q\) and the random variable \(Y\). We will consider several candidate \(Y\) below, and fix \(P\) and \(Q\) throughout as follows (we sometimes write \(P_X\) for the original measure to stress the contrast with the measure \(P_Y\) induced by \(Y\)).
\(P\) and \(Q\) have densities, as shown in the middle plot. \(P\) is also absolutely continuous w.r.t. \(Q\), and hence has a density/likelihood ratio/Radon-Nikodym derivative. Note that \(P\) has a region around \(X=1/2\) where it assigns probability zero while \(Q\) does not, but that is allowed. The reverse would not be.
Here we take as our random variable \(Y\) the function given by this plot:
Here \(Y\) maps \([0,1]\) to \([-1/10, 1/2]\). Moreover, \(Y\) is not simple: it is many-to-one. Either two inputs or a whole input interval are mapped to the same output value. The resulting situation in terms of CDFs, PDFs, and densities is shown in the below graphs:
What we see is that \(P_Y\) and \(Q_Y\) do not have Lebesgue densities. The reason is that they have point-masses at the values of \(Y\) where the graph of \(Y\) is flat. This is visible as the vertical spikes in the middle PDF graph. But, and this is the point of this post, these problems cancel in the relative density aka Radon-Nikodym derivative. This is visible in the rightmost plot, where we see that the relative density is fine. The density is not continuous, but the value at each level where \(Y\) is flat is well-defined.
Just to spice things up, here we do the same exercise but with \(Y\) the Cantor function. That is, our random variable \(Y\) is given by this plot:
The Cantor function is continuous and almost everywhere flat.
The resulting situation is shown below
Again, the PDFs of \(P_Y\) and \(Q_Y\) go crazy. They should, as both are a countable mixture of point masses. So in the PDF plot we are seeing spikes (which are real) with heights (which are numerical artefacts). The crux of this post is that the relative density is, again, perfectly fine.
Finally, let’s look side-by-side at the relative density of \(P_X/Q_X\) and \(P_Y/Q_Y\):
This plot nicely illustrates the averaging rule \[\frac{d P_Y}{d Q_Y} ~=~ \operatorname*{\mathbb E}\left[ \frac{d P_X}{d Q_X} \middle| Y \right]\] As such, the density is only defined at values of \(Y\) that occur. In this case, that is the dyadic rationals. All in all, extremely intriguing!