EE229 Lecture Note 2

2.1 Entropy

Hence we define $$ H(X)=-\sum_{x\in\mathcal X}p(x)\log p(x)\equiv E_p \log {1\over p(X)} $$

  • where $\log$ is the simplified $\log_2$, and we will use the convention that $0\log 0=0$ defined but not “legal”.
  • $E_p$ 在这里指的是 $X$ under $p$ distribution的时候,$X$ 对某个函数的期望。This specifies an equivalent definition by expectation of entropy.

Property

  1. $H(X)\ge 0$

  2. Replacing the base 替换底数:Let $H_a(X)$ be the entropy defined in the base of the log changed into $a$ instead of 2.

    $H_b(X)=(\log_b a)H_a(X)$

    • So $H_a(X)=(\log_2a )H(X)$.

Examples

1. Toss

$X\sim Bern(p)$

$H(X)=-p\log p-(1-p)\log (1-p)$,

2. Distribution

$$ X = \begin{cases} a,&\text{ with probability } \frac12,\\ b,&\text{ with probability } \frac14,\\ c,&\text{ with probability } \frac18,\\ d,&\text{ with probability } \frac18\end{cases} $$ hence $$ H(X)=-\sum_{x\in\mathcal X}p(x)\log p(x)=-\frac12\log\frac12-\frac14\log\frac14-\frac18\log\frac18-\frac18\log\frac18 = \frac74 \text{bits}. $$

意义:

  • 猜 $X$ 是多少,通过二元问题(binary question)
    • “Is X = a?” 如果不是,就问 “Is X = b?";这样问下来的问题的个数的期望就是 7/4=1.75

2.2 Joint and Conditional Entropy

Definition of Joint Entropy

$H(X,Y)=-\sum_{x\in{\cal X}}\sum_{y\in{\cal Y}} p(x,y)\log p(x,y)$

等效:$H(X,Y)=-E \log p(X,Y)$.

Definition of Conditional Entropy

If $(X,Y)\sim p(x,y)$ (联合分布 joint distribution),

the conditional entropy $H(Y|X)$ is defined as the sum of all $H(Y|X=x)$ over $x$: $$ \begin{aligned} H(Y|X)&=\sum_{x\in\mathcal X}p(x)H(Y|X=x) \\&= -\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}p(y|x)\log p(y|x) \\&= -\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log p(y|x) \\&= -{E}_{X,Y}\log p(Y|X), \end{aligned} $$ 在这里,$H(Y|X=x)$ 就是 $p(y|x)$ 的entropy

The Chain Rule

$$ \begin{aligned} H(Y|X)&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log p(y|x) \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)\over p(x)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)\over p(x)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)}+\sum_{x\in\mathcal X}\log{p(x)\sum_{y\in\mathcal Y}p(x,y)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)}+\sum_{x\in\mathcal X}p(x)\log{p(x)} \\&=H(X,Y)-H(X) \end{aligned} $$

通过这个推导,我们知道 $$ H(X,Y)=H(X)+H(Y|X) $$ 很自然,并且 equivalently 这可以从 $p(X,Y)=p(X)p(Y|X)$ 得出,因为:

$$ \log p(X,Y)=\log p(X)+\log p(Y|X) $$

所以对两侧做 expectation,我们就可以得到上文的定理。

Chain Rule Corollary (Conditional Entropy)

$$ H(X,Y|Z)=H(X|Z)+H(Y|X,Z) $$

将 $ p(\cdot|z)$ 和 $H(\cdot|Z)$ 视作 $\hat p(\cdot)$ 和 $\hat H(\cdot)$

Property: $H(X)-H(X|Y)=H(Y)-H(Y|X)$

Note that $H(Y|X)\ne H(X|Y)$ but $H(X)+H(Y|X)=H(Y)+H(X|Y)$, which leads to $H(X)-H(X|Y)=H(Y)-H(Y|X)$.

2.3 Relative Entropy(KL distance) and Mutual Information

Def of Relative Entropy 相对熵 = KL散度

The relative entropy i.e. Kullback-Leibler (KL) Distance between 2 probability mass functions $p(x)$ and $q(x)$ is defined as $$ D(p||q)=\sum_{x\in\mathcal X}p(x)\log{p(x)\over q(x)}\equiv E_p\log{p(X)\over q(X)} $$