EE229 Lecture Note 2
2.1 Entropy
Hence we define $$ H(X)=-\sum_{x\in\mathcal X}p(x)\log p(x)\equiv E_p \log {1\over p(X)} $$
- where $\log$ is the simplified $\log_2$, and we will use the convention that $0\log 0=0$ defined but not “legal”.
- $E_p$ 在这里指的是 $X$ under $p$ distribution的时候,$X$ 对某个函数的期望。This specifies an equivalent definition by expectation of entropy.
Property
-
$H(X)\ge 0$
-
Replacing the base 替换底数:Let $H_a(X)$ be the entropy defined in the base of the log changed into $a$ instead of 2.
$H_b(X)=(\log_b a)H_a(X)$
- So $H_a(X)=(\log_2a )H(X)$.
Examples
1. Toss
$X\sim Bern(p)$
$H(X)=-p\log p-(1-p)\log (1-p)$,
2. Distribution
$$ X = \begin{cases} a,&\text{ with probability } \frac12,\\ b,&\text{ with probability } \frac14,\\ c,&\text{ with probability } \frac18,\\ d,&\text{ with probability } \frac18\end{cases} $$ hence $$ H(X)=-\sum_{x\in\mathcal X}p(x)\log p(x)=-\frac12\log\frac12-\frac14\log\frac14-\frac18\log\frac18-\frac18\log\frac18 = \frac74 \text{bits}. $$
意义:
- 猜 $X$ 是多少,通过二元问题(binary question)
- “Is X = a?” 如果不是,就问 “Is X = b?";这样问下来的问题的个数的期望就是 7/4=1.75
2.2 Joint and Conditional Entropy
Definition of Joint Entropy
$H(X,Y)=-\sum_{x\in{\cal X}}\sum_{y\in{\cal Y}} p(x,y)\log p(x,y)$
等效:$H(X,Y)=-E \log p(X,Y)$.
Definition of Conditional Entropy
If $(X,Y)\sim p(x,y)$ (联合分布 joint distribution),
the conditional entropy $H(Y|X)$ is defined as the sum of all $H(Y|X=x)$ over $x$: $$ \begin{aligned} H(Y|X)&=\sum_{x\in\mathcal X}p(x)H(Y|X=x) \\&= -\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}p(y|x)\log p(y|x) \\&= -\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log p(y|x) \\&= -{E}_{X,Y}\log p(Y|X), \end{aligned} $$ 在这里,$H(Y|X=x)$ 就是 $p(y|x)$ 的entropy
The Chain Rule
$$ \begin{aligned} H(Y|X)&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log p(y|x) \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)\over p(x)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)\over p(x)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)}+\sum_{x\in\mathcal X}\log{p(x)\sum_{y\in\mathcal Y}p(x,y)} \\&=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}p(x,y)\log{p(x,y)}+\sum_{x\in\mathcal X}p(x)\log{p(x)} \\&=H(X,Y)-H(X) \end{aligned} $$
通过这个推导,我们知道 $$ H(X,Y)=H(X)+H(Y|X) $$ 很自然,并且 equivalently 这可以从 $p(X,Y)=p(X)p(Y|X)$ 得出,因为:
$$ \log p(X,Y)=\log p(X)+\log p(Y|X) $$
所以对两侧做 expectation,我们就可以得到上文的定理。
Chain Rule Corollary (Conditional Entropy)
$$ H(X,Y|Z)=H(X|Z)+H(Y|X,Z) $$
将 $ p(\cdot|z)$ 和 $H(\cdot|Z)$ 视作 $\hat p(\cdot)$ 和 $\hat H(\cdot)$
Property: $H(X)-H(X|Y)=H(Y)-H(Y|X)$
Note that $H(Y|X)\ne H(X|Y)$ but $H(X)+H(Y|X)=H(Y)+H(X|Y)$, which leads to $H(X)-H(X|Y)=H(Y)-H(Y|X)$.
2.3 Relative Entropy(KL distance) and Mutual Information
Def of Relative Entropy 相对熵 = KL散度
The relative entropy i.e. Kullback-Leibler (KL) Distance between 2 probability mass functions $p(x)$ and $q(x)$ is defined as $$ D(p||q)=\sum_{x\in\mathcal X}p(x)\log{p(x)\over q(x)}\equiv E_p\log{p(X)\over q(X)} $$