INFOTH Note 9: Differential Entropy
9.1 Differential Entropy
For discrete r.v.s, we have the entropy. But for real-valued r.v.s with density functions, we need another form of entropy to process things. That is differential entropy.
Def: Diff. Ent.
Let $X\sim f$ be a real-valued r.v. with density $f$, meaning
$$ \Pr[X\in[a,b]]=\int_a^b f(x)\mathrm d x,\quad \int_{-\infty}^\infty f(x)\mathrm d x=1. $$We define $(\int_\R:=\int_{-\infty}^\infty)$
$$ h(X)=-\int_{\R} f(x)\log f(x)\mathrm d x $$In fact, the $h$ may not exist (the integral is $\pm \infty$), in case
$$ \int_\R \max(-f(x)\log f(x),0)=\infty,\text{ or } \int_\R \max(f(x)\log f(x),0)=\infty. $$Example: Uniform distribution
$$ X\sim\text{Unif}(a,b),\\ h(X)=\int_a^b \frac1{b-a}\log(b-a)\mathrm d x=\log(b-a) $$Example: Gaussian
$$ X\sim\mathcal N(0,\sigma^2),\quad \sigma^2>0,\\ f(x)={1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2},x\in\R $$The diff. entropy:
$$ \begin{aligned} h(X)&=-\int_\R {1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2}\log \left({1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2}\right)\mathrm d x \\&=-\log e\int_\R {1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2}\ln\left({1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2}\right)\mathrm d x \\&=-\log e\int_\R {1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2}\left(-{x^2\over 2\sigma^2}-\ln(\sigma\sqrt{2\pi})\right)\mathrm d x \\&=\log e\cdot \ln(\sigma\sqrt{2\pi})+\log e\int_\R{1\over \sigma\sqrt{2\pi}}e^{-x^2/2\sigma^2}{x^2\over 2\sigma^2}\mathrm d x \\&=\log(\sigma\sqrt{2\pi})+\log e\cdot\text{Var}[f]\cdot\frac1{2\sigma^2} \\&=\log(\sigma\sqrt{2\pi})+\frac12\log e \\&=\frac12\log(2\pi e\sigma^2) \end{aligned} $$Differential Entropy is not Entropy
- It can be negative. The case when $b-a<1$ in the uniform dist.
- Quantization
Quantization (Discretization)
Quantize by the interval $\Delta$ to approximate the integral.
$$ \sum_{k=-\infty}^\infty -\Delta\cdot (f(x_k)\log f(x_k))\approx-\int_{\R} f(x)\log f(x)\mathrm d x $$where $x_k=k\Delta$ i.e. $\cdots,-2\Delta,-\Delta,0,\Delta,\cdots$
So we have the true entropy with $\Delta$ quantization interval as an entropy of infinitely many probability values summing to 1:
$$ H(\cdots,p_{-2}^\Delta,p_{-1}^\Delta,p_0^\Delta,p_1^\Delta,p_2^\Delta,\cdots) =-\sum_k p_k^\Delta\log p_k^\Delta $$where the probability value $p_k^\Delta$ is
$$ p_k^\Delta:=\Pr[X\in[k\Delta,(k+1)\Delta)]=\int_{k\Delta}^{(k+1)\Delta}f(x)\mathrm d x \approx f(x_k)\Delta $$Then
$$ -p_k^\Delta\log p_k^\Delta\approx -f(x_k)\Delta\log (f(x_k)\Delta) =-f(x_k)\Delta\cdot\log(f(x_k))-f(x_k)\Delta\cdot\log \Delta $$Hence
$$ H(\cdots,p_{-2}^\Delta,p_{-1}^\Delta,p_0^\Delta,p_1^\Delta,p_2^\Delta,\cdots) =-\sum_k f(x_k)\Delta \log f(x_k)-\log\Delta\cdot\sum_k \Delta f(x_k) $$When $\Delta\to 0$,
$$ \sum_k\Delta \cdot f(x_k)=\int_\R f(x)\mathrm d x=1 $$and
$$ -\sum_k\Delta \cdot f(x_k)\log f(x_k)=-\int_\R f(x)\log f(x)\mathrm d x=h(X) $$hence
$$ H(\cdots,p_{-2}^\Delta,p_{-1}^\Delta,p_0^\Delta,p_1^\Delta,p_2^\Delta,\cdots) \to h(X)-\log \Delta,\quad\Delta\to 0 $$We find that $\log\Delta$ approaches $-\infty$ and the “entropy” $H$ approaches $+\infty$ while $h(X)$ still exists.
9.2 Differential Definitions of Other Quantities
Joint, Cond., MI
Joint Differential Entropy
$$ h(X,Y):=-\int_\R\int_\R f(x,y)\log f(x,y)\mathrm d x\mathrm d y\\ h(\bold X^n):=-\int_{\R^n}f(\bold x^n)\log f(\bold x^n)\mathrm d {\bold x}^n $$Conditional Diff. Ent.
$$ h(X|Y):=-\int_\R f(y)\int_\R f(x|y)\log f(x|y)\mathrm d x\mathrm d y =-\int_\R\int_\R f(x,y)\log f(x|y)\mathrm d x\mathrm d y $$Mutual Information (differentially)
$$ I(X;Y):=h(Y)-h(Y|X)=h(X)-h(X|Y)=h(X)+h(Y)-h(X,Y)=D(f(x,y)||f(x)f(y)) $$Relative Entropy ($f$ vs $g$)
$$ D(f||g):=\int_\R f(x)\log{f(x)\over g(x)}\mathrm d x $$Note that $D(f||g)\ge 0$ even if the differential entropy is not:
$$ D(f||g) =\int_\R g(x){f(x)\over g(x)}\log{f(x)\over g(x)}\mathrm d x \ge\left(\int_\R g(x){f(x)\over g(x)}\mathrm d x \right)\log \left(\int_\R g(x){f(x)\over g(x)}\mathrm d x \right)=0. $$Hence
$$ D(f||g)=\begin{cases}\int_\R f(x)\log{f(x)\over g(x)}\mathrm d x,&g(x)=0\implies f(x)=0,\\ \infty,&\text{otherwise}.\end{cases} $$Intuition from discrete to continuous
We have discrete distributions
$$ (p^\Delta(k),k\in\Z):p^\Delta (k)=\int_{k\Delta}^{(k+1)\Delta}f(x)\mathrm d x \\ (q^\Delta(k),k\in\Z):q^\Delta (k)=\int_{k\Delta}^{(k+1)\Delta}g(x)\mathrm d x $$Then
$$ \lim_{\Delta\to 0}D(p^\Delta||q^\Delta)=D(f||g) $$Intuition:
$$ \sum_{\mathrm d x}f(x)\mathrm d x\cdot {\log {f(x)\mathrm d x\over g(x)\mathrm d x}}=\sum_{\mathrm d x}f(x)\mathrm d x{\log {f(x)\over g(x)}}=\int_\R f{\log{f\over g}}\mathrm d x. $$