INFOTH Note 7: Huffman & Arithmetic Coding

7.1 Huffman Coding

The Algorithm

For a distribution (of appearance) $(p(x),x\in\mathcal X)$,

  1. Sort the probability vector $\bold p$ in decreasing order,
  2. “Merge” the two smallest probabilities,
  3. Reorder (sort again) and go back to merging.

How to merge: assume we have 2 “symbols” now, say $a,b$

  1. Generate a node $ab$ which has 2 children $a$ and $b$
  2. Assign (e.g.) 0 to $a$ and $1$ to $b$ (to the edges)

Finally, we will get a Huffman tree, where each path from the root (all symbols) to a leaf (one symbol) is the codeword for that symbol.

huff1

Thus we get a prefix-free code.

Proof of Optimality

Optimality: Huffman Coding actually is a solution to minimize the expected length.

Proof.

  1. In any optimal prefix-free code $c$, if $p(x)>p(x')$, then $l (c(x))\le l (c(x'))$ (the more probable one with less length)

    • If not, exchange $c(x)$ and $c(x')$ and get a strictly better code
  2. There are at least 2 codewords having maximum length in the optimal code $c$.

    • If there is a unique $x$ with $l(c(x))=l_{max}=\max(l(c(y)),y\in\mathcal X)$

    • We can replace $c(x)$ by its prefix dropping the last bit, still making the code prefix free and strictly better than previous

From 2, we can assume that the codewords corresponding to two lowest-probability symbols (not necessarily the same probability) differ only in that last bit.

7.2 Arithmetic Coding

Sequential Property: Dynamic Update

Unlike Huffman Code which needs to build a codebook (map), in Arithmetic Code, we can sequentially code: We don’t map symbols to 0,1.

We can also keep decoding when the code sequence is still growing.

The Algorithm

Assume we have known the probabilities and $X_1,X_2,\dots$ are i.i.d. $p$.

We build an order for $\mathcal X$ hence we can compare sequences.

Define the CDF

$$ F(\bold x^n):=\sum_{\bold y^n\le \bold x^n}p(\bold y^n) $$

for a length $n$, and we assign the block $\bold x^n$ an interval: $[F(\bold x^n)-p(\bold x^n), F(\bold x^n))\subseteq [0,1]$.

To encode a $n$-length block, we can finally choose any point in this interval such as the midpoint $\bar F(\bold x^n)$, and write the chosen [0,1] number as binary: $0.\overline{b_1b_2b_3\dots}$; the $b$’s is the codeword.

Dynamically Encoding

  1. Seeing every new symbol, we partition our current interval w/ the probabilities and update the current interval as the corresponding part of the current interval.
  2. Iterate.
  3. Binary representation.

For example, we have $p(A)=0.5,p(B)=0.3,p(C)=0.2$. Encode AB

  1. [0,0.5) for A, [0.5,0.8) for B, [0.8,1) for C.

    Seeing A. Enter [0,0.5).

  2. [0.0.25) for A. [0.25,0.4) for B. [0.4,0.5) for C.

    Seeing B. Enter [0.25,0.4).

  3. Choose one such as (0.25+0.4)/2, find the binary.

Properties

  • Although it’s not optimal for any fixed blocked length $n$, it’s dynamic.
  • The dynamically generated block codewords are prefix-free.

Finally, we have the length

$$ l(c(\bold x^n))=\lceil\log \frac1{p(\bold x^n)}\rceil + 1\le \log{1\over p(\bold x^n)}+2. $$