Data Structure and Algorithm
Algorithms
Greedy Algorithms
Huffman Coding

Huffman Coding

Huffman Coding is a popular lossless data compression algorithm. It uses a Greedy approach to assign variable-length binary codes to input characters based on their frequencies of occurrence.

  • The most frequent character gets the smallest code.
  • The least frequent character gets the longest code.

Prefix Rule (Prefix Codes)

Huffman Coding produces prefix codes (or prefix-free codes), meaning no code assigned to a character is a prefix of any other character's code. This property is crucial because it ensures unambiguous decoding of the compressed bitstream.

For example, if code for A is 0 and code for B is 10, the compressed string 0100 can only be decoded as A B A (0 -> A, 10 -> B, 0 -> A).


Steps to Build the Huffman Tree

Given a list of characters and their frequencies:

  1. Create a leaf node for each unique character and add it to a Min-Heap (Priority Queue, compared by frequency).
  2. Extract the two nodes with the lowest frequencies from the min-heap.
  3. Create a new internal node with a frequency equal to the sum of the two nodes' frequencies.
    • Set the first extracted node as its left child.
    • Set the second extracted node as its right child.
    • Insert this new internal node back into the min-heap.
  4. Repeat steps 2 and 3 until the min-heap contains only one node. This remaining node is the root of the Huffman Tree.

Trace Walkthrough

Consider characters and their frequencies:

  • a: 5, b: 9, c: 12, d: 13, e: 16, f: 45
Step 1: Extract (a: 5) and (b: 9). Create internal node (14) with children a and b.
        Heap: [c:12, d:13, internal:14, e:16, f:45]

Step 2: Extract (c: 12) and (d: 13). Create internal node (25) with children c and d.
        Heap: [internal:14, e:16, internal:25, f:45]

Step 3: Extract (internal: 14) and (e: 16). Create internal node (30).
        Heap: [internal:25, internal:30, f:45]

Step 4: Extract (internal: 25) and (internal: 30). Create internal node (55).
        Heap: [f:45, internal:55]

Step 5: Extract (f: 45) and (internal: 55). Create internal node (100).
        Heap: [internal:100] (Root)

Traversing the Tree

Starting from the root, we assign 0 for moving to the left child and 1 for moving to the right child.

Character   Huffman Code
f           0
c           100
d           101
a           1100
b           1101
e           111

JavaScript Code Implementation

// A Huffman Tree node
class HuffmanNode {
    constructor(char, freq) {
        this.char = char;
        this.freq = freq;
        this.left = null;
        this.right = null;
    }
}
 
// Simple Min-Heap implementation for Priority Queue
class MinHeap {
    constructor() {
        this.data = [];
    }
 
    push(node) {
        this.data.push(node);
        this.upHeap(this.data.length - 1);
    }
 
    pop() {
        if (this.data.length === 0) return null;
        const top = this.data[0];
        const bottom = this.data.pop();
        if (this.data.length > 0) {
            this.data[0] = bottom;
            this.downHeap(0);
        }
        return top;
    }
 
    size() {
        return this.data.length;
    }
 
    upHeap(i) {
        while (i > 0) {
            const p = Math.floor((i - 1) / 2);
            if (this.data[i].freq >= this.data[p].freq) break;
            this.swap(i, p);
            i = p;
        }
    }
 
    downHeap(i) {
        const len = this.data.length;
        while (2 * i + 1 < len) {
            let left = 2 * i + 1;
            let right = left + 1;
            let smallest = i;
 
            if (this.data[left].freq < this.data[smallest].freq) {
                smallest = left;
            }
            if (right < len && this.data[right].freq < this.data[smallest].freq) {
                smallest = right;
            }
            if (smallest === i) break;
            this.swap(i, smallest);
            i = smallest;
        }
    }
 
    swap(i, j) {
        const temp = this.data[i];
        this.data[i] = this.data[j];
        this.data[j] = temp;
    }
}
 
// Function to print codes from the Huffman Tree
function printCodes(root, code, result) {
    if (!root) return;
 
    // If it's a leaf node, print the character and code
    if (!root.left && !root.right) {
        result[root.char] = code;
        return;
    }
 
    printCodes(root.left, code + "0", result);
    printCodes(root.right, code + "1", result);
}
 
// Main Huffman Coding driver function
function buildHuffmanTree(chars, freqs) {
    const minHeap = new MinHeap();
 
    // 1. Create a leaf node for each character and push to heap
    for (let i = 0; i < chars.length; i++) {
        minHeap.push(new HuffmanNode(chars[i], freqs[i]));
    }
 
    // 2. Combine nodes until only one remains
    while (minHeap.size() > 1) {
        const left = minHeap.pop();
        const right = minHeap.pop();
 
        // Create internal node with combined frequency
        const internal = new HuffmanNode(null, left.freq + right.freq);
        internal.left = left;
        internal.right = right;
 
        minHeap.push(internal);
    }
 
    // 3. Print codes starting from root
    const root = minHeap.pop();
    const codes = {};
    printCodes(root, "", codes);
 
    return codes;
}
 
// Driver code matching the walkthrough example
const chars = ["a", "b", "c", "d", "e", "f"];
const freqs = [5, 9, 12, 13, 16, 45];
 
const huffmanCodes = buildHuffmanTree(chars, freqs);
 
console.log("Character \t Huffman Code");
Object.keys(huffmanCodes).sort().forEach(char => {
    console.log(`${char} \t\t ${huffmanCodes[char]}`);
});
/*
Output:
a 		 1100
b 		 1101
c 		 100
d 		 101
e 		 111
f 		 0
*/

Complexity Analysis

  • Time Complexity: O(N log N)
    • Inserting $N$ unique nodes into the min-heap takes $O(N \log N)$ time.
    • The loop combining nodes runs $N-1$ times, each taking $O(\log N)$ heap operations.
  • Space Complexity: O(N)
    • To store tree nodes and heap structures.