INFOTH Note 7: Huffman & Arithmetic Coding

2025-09-25

3 min read

Notes

Information Theory and Coding (UCB FA24 EE229A, rewritten in 2025)

7.1 Huffman Coding

The Algorithm

For a distribution (of appearance) $(p(x),x\in\mathcal X)$,

Sort the probability vector $\bold p$ in decreasing order,
“Merge” the two smallest probabilities,
Reorder (sort again) and go back to merging.

How to merge: assume we have 2 “symbols” now, say $a,b$

Generate a node $ab$ which has 2 children $a$ and $b$
Assign (e.g.) 0 to $a$ and $1$ to $b$ (to the edges)

Finally, we will get a Huffman tree, where each path from the root (all symbols) to a leaf (one symbol) is the codeword for that symbol.

huff1

Thus we get a prefix-free code.

Proof of Optimality

Optimality: Huffman Coding actually is a solution to minimize the expected length.

Proof.

In any optimal prefix-free code $c$, if $p(x)>p(x')$, then $l (c(x))\le l (c(x'))$ (the more probable one with less length)
- If not, exchange $c(x)$ and $c(x')$ and get a strictly better code
There are at least 2 codewords having maximum length in the optimal code $c$.
- If there is a unique $x$ with $l(c(x))=l_{max}=\max(l(c(y)),y\in\mathcal X)$
- We can replace $c(x)$ by its prefix dropping the last bit, still making the code prefix free and strictly better than previous

From 2, we can assume that the codewords corresponding to two lowest-probability symbols (not necessarily the same probability) differ only in that last bit.

7.2 Arithmetic Coding

Sequential Property: Dynamic Update

Unlike Huffman Code which needs to build a codebook (map), in Arithmetic Code, we can sequentially code: We don’t map symbols to 0,1.

We can also keep decoding when the code sequence is still growing.

The Algorithm

Assume we have known the probabilities and $X_1,X_2,\dots$ are i.i.d. $p$.

We build an order for $\mathcal X$ hence we can compare sequences.

Define the CDF

$$ F(\bold x^n):=\sum_{\bold y^n\le \bold x^n}p(\bold y^n) $$

for a length $n$, and we assign the block $\bold x^n$ an interval: $[F(\bold x^n)-p(\bold x^n), F(\bold x^n))\subseteq [0,1]$.

To encode a $n$-length block, we can finally choose any point in this interval such as the midpoint $\bar F(\bold x^n)$, and write the chosen [0,1] number as binary: $0.\overline{b_1b_2b_3\dots}$; the $b$’s is the codeword.

Dynamically Encoding

Seeing every new symbol, we partition our current interval w/ the probabilities and update the current interval as the corresponding part of the current interval.
Iterate.
Binary representation.

For example, we have $p(A)=0.5,p(B)=0.3,p(C)=0.2$. Encode AB

[0,0.5) for A, [0.5,0.8) for B, [0.8,1) for C.

Seeing A. Enter [0,0.5).
[0.0.25) for A. [0.25,0.4) for B. [0.4,0.5) for C.

Seeing B. Enter [0.25,0.4).
Choose one such as (0.25+0.4)/2, find the binary.

Properties

Although it’s not optimal for any fixed blocked length $n$, it’s dynamic.
The dynamically generated block codewords are prefix-free.

Finally, we have the length

$$ l(c(\bold x^n))=\lceil\log \frac1{p(\bold x^n)}\rceil + 1\le \log{1\over p(\bold x^n)}+2. $$

INFOTH Note 7: Huffman & Arithmetic Coding

7.1 Huffman Coding

The Algorithm

Proof of Optimality

7.2 Arithmetic Coding

Sequential Property: Dynamic Update

The Algorithm

Dynamically Encoding

Properties

Portals

On This Page

See Also