INFOTH Note 7: Huffman & Arithmetic Coding
7.1 Huffman Coding
The Algorithm
For a distribution (of appearance) $(p(x),x\in\mathcal X)$,
- Sort the probability vector $\bold p$ in decreasing order,
- “Merge” the two smallest probabilities,
- Reorder (sort again) and go back to merging.
How to merge: assume we have 2 “symbols” now, say $a,b$
- Generate a node $ab$ which has 2 children $a$ and $b$
- Assign (e.g.) 0 to $a$ and $1$ to $b$ (to the edges)
Finally, we will get a Huffman tree, where each path from the root (all symbols) to a leaf (one symbol) is the codeword for that symbol.


Thus we get a prefix-free code.
Proof of Optimality
Optimality: Huffman Coding actually is a solution to minimize the expected length.
Proof.
-
In any optimal prefix-free code $c$, if $p(x)>p(x')$, then $l (c(x))\le l (c(x'))$ (the more probable one with less length)
- If not, exchange $c(x)$ and $c(x')$ and get a strictly better code
-
There are at least 2 codewords having maximum length in the optimal code $c$.
-
If there is a unique $x$ with $l(c(x))=l_{max}=\max(l(c(y)),y\in\mathcal X)$
-
We can replace $c(x)$ by its prefix dropping the last bit, still making the code prefix free and strictly better than previous
-
From 2, we can assume that the codewords corresponding to two lowest-probability symbols (not necessarily the same probability) differ only in that last bit.
7.2 Arithmetic Coding
Sequential Property: Dynamic Update
Unlike Huffman Code which needs to build a codebook (map), in Arithmetic Code, we can sequentially code: We don’t map symbols to 0,1.
We can also keep decoding when the code sequence is still growing.
The Algorithm
Assume we have known the probabilities and $X_1,X_2,\dots$ are i.i.d. $p$.
We build an order for $\mathcal X$ hence we can compare sequences.
Define the CDF
$$ F(\bold x^n):=\sum_{\bold y^n\le \bold x^n}p(\bold y^n) $$for a length $n$, and we assign the block $\bold x^n$ an interval: $[F(\bold x^n)-p(\bold x^n), F(\bold x^n))\subseteq [0,1]$.
To encode a $n$-length block, we can finally choose any point in this interval such as the midpoint $\bar F(\bold x^n)$, and write the chosen [0,1] number as binary: $0.\overline{b_1b_2b_3\dots}$; the $b$’s is the codeword.
Dynamically Encoding
- Seeing every new symbol, we partition our current interval w/ the probabilities and update the current interval as the corresponding part of the current interval.
- Iterate.
- Binary representation.
For example, we have $p(A)=0.5,p(B)=0.3,p(C)=0.2$. Encode AB
-
[0,0.5)for A,[0.5,0.8)for B,[0.8,1)for C.Seeing A. Enter
[0,0.5). -
[0.0.25)for A.[0.25,0.4)for B.[0.4,0.5)for C.Seeing B. Enter
[0.25,0.4). -
Choose one such as
(0.25+0.4)/2, find the binary.
Properties
- Although it’s not optimal for any fixed blocked length $n$, it’s dynamic.
- The dynamically generated block codewords are prefix-free.
Finally, we have the length
$$ l(c(\bold x^n))=\lceil\log \frac1{p(\bold x^n)}\rceil + 1\le \log{1\over p(\bold x^n)}+2. $$