INFOTH Note 2: Joint & Conditional Entropy, Chain Rule

2.1 Entropy

Def 2.1.1 Entropy

We define, for a discrete r.v. $X$ on a domain $\mathcal X$ with finitely many or countably infinitely many elements (such as a finite set or $\Z$), a distribution $p\in\Delta_{\cal X}$:

$$ H(X)=-\sum_{x\in\mathcal X}p(x)\log p(x)\equiv E_{X\sim p} \log {1\over p(X)} $$
  • $E_p$ 在这里指的是 $X$ under $p$ distribution的时候,$X$ 对某个函数的期望。This specifies an equivalent definition by expectation of entropy.

Properties

  1. $H(X)\ge 0$

  2. Replacing the base 替换底数:Let $H_a(X)$ be the entropy defined in the base of the log changed into $a$ instead of 2.

    $H_b(X)=(\log_b a)H_a(X)$

    • So $H_a(X)=(\log_2a )H(X)$.

2.1.2 Examples

1. Toss

$X\sim Bern(p)$

$H(X)=-p\log p-(1-p)\log (1-p)$,

2. Distribution

$$ X = \begin{cases} a,&\text{ with probability } \frac12,\\ b,&\text{ with probability } \frac14,\\ c,&\text{ with probability } \frac18,\\ d,&\text{ with probability } \frac18\end{cases} $$

hence

$$ H(X)=-\sum_{x\in\mathcal X}p(x)\log p(x)=-\frac12\log\frac12-\frac14\log\frac14-\frac18\log\frac18-\frac18\log\frac18 = \frac74 \text{bits}. $$

意义:

  • 猜 $X$ 是多少,通过二元问题(binary question)
    • “Is X = a?” 如果不是,就问 “Is X = b?";这样问下来的问题的个数的期望就是 7/4=1.75

2.2 Joint and Conditional Entropy

Def 2.2.1 Joint Entropy

$H(X,Y)=-\sum_{x\in{\cal X}}\sum_{y\in{\cal Y}} p(x,y)\log p(x,y)$

等效:$H(X,Y)=-E \log p(X,Y)$.

Def 2.2.2 Conditional Entropy

If $(X,Y)\sim p(x,y)$ (联合分布 joint distribution),

the conditional entropy $H(Y|X)$ is defined as the sum of all $H(Y|X=x)$ over $x$:

$$ \begin{aligned} H(Y|X)&=\sum_{x\in\mathcal X}p(x)H(Y|X=x) \\&= -\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}p(y|x)\log p(y|x) \\&= -\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log p(y|x) \\&= -{E}_{X,Y}[\log p(Y|X)], \end{aligned} $$

在这里,$H(Y|X=x)$ 就是 $p(y|x)$ 的entropy

Independence

When $X\perp\!\!\!\perp Y$,

$$ H(X)=H(X|Y);\quad H(Y)=H(Y|X);\quad H(X,Y)=H(X)+H(Y) $$

Theorem 2.2.3 The Chain Rule

$$ \begin{aligned} H(Y|X)&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log p(y|x) \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)\over p(x)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)\over p(x)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)}+\sum_{x\in\mathcal X}\log{p(x)\sum_{y\in\mathcal Y}p(x,y)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)}+\sum_{x\in\mathcal X}p(x)\log{p(x)} \\&=H(X,Y)-H(X) \end{aligned} $$

通过这个推导,我们知道

$$ H(X,Y)=H(X)+H(Y|X) $$

很自然,并且 equivalently 这可以从 $p(X,Y)=p(X)p(Y|X)$ 得出,因为:

$$ \log p(X,Y)=\log p(X)+\log p(Y|X) $$

所以对两侧做 expectation over $X,Y$,我们就可以得到上文的定理。

Chain Rule Corollary (Chain for Conditional Entropy)

$$ H(X,Y|Z)=H(X|Z)+H(Y|X,Z) $$

将 $ p(\cdot|z)$ 和 $H(\cdot|Z)$ 视作 $\hat p(\cdot)$ 和 $\hat H(\cdot)$,我们就可以得到 Conditional Entropy 的 Chain Rule.

Property: $H(X)-H(X|Y)=H(Y)-H(Y|X)$

Note that $H(Y|X)\ne H(X|Y)$ but $H(X)+H(Y|X)=H(Y)+H(X|Y)$, which leads to $H(X)-H(X|Y)=H(Y)-H(Y|X)$.

因为

$$ H(Y|X)+H(X)=H(X,Y)=H(X|Y)+H(Y) $$

所以

$$ H(X)-H(X|Y)=H(Y)-H(Y|X) $$

Theorem 2.2.4 General Chain Rule

$$ H(X_1\dots X_n) =H(X_1)+H(X_2|X_1)+\dots+H(X_n|X_{n-1}\dots X_1) =\sum_{i=1}^nH(X_i|X_{1:i-1}) $$

2.3 Summary

  1. Conditional entropy
    • $H(Y|X)$ 是 scalar value (summed over $X$) 而不是关于 $X$ 的随机变量函数,这违反了概率论的定义惯例:
      • $H(Y|X)=\sum_x H(Y|X=x)$
  2. Joint entropy $H(\cup X_i)$, $H(X,Y)$
  3. Chain Rule – following that in Probabilistics