Huffman Coding
Huffman Coding is a popular lossless data compression algorithm. It uses a Greedy approach to assign variable-length binary codes to input characters based on their frequencies of occurrence.
- The most frequent character gets the smallest code.
- The least frequent character gets the longest code.
Prefix Rule (Prefix Codes)
Huffman Coding produces prefix codes (or prefix-free codes), meaning no code assigned to a character is a prefix of any other character's code. This property is crucial because it ensures unambiguous decoding of the compressed bitstream.
For example, if code for A is 0 and code for B is 10, the compressed string 0100 can only be decoded as A B A (0 -> A, 10 -> B, 0 -> A).
Steps to Build the Huffman Tree
Given a list of characters and their frequencies:
- Create a leaf node for each unique character and add it to a Min-Heap (Priority Queue, compared by frequency).
- Extract the two nodes with the lowest frequencies from the min-heap.
- Create a new internal node with a frequency equal to the sum of the two nodes' frequencies.
- Set the first extracted node as its left child.
- Set the second extracted node as its right child.
- Insert this new internal node back into the min-heap.
- Repeat steps 2 and 3 until the min-heap contains only one node. This remaining node is the root of the Huffman Tree.
Trace Walkthrough
Consider characters and their frequencies:
a: 5,b: 9,c: 12,d: 13,e: 16,f: 45
Step 1: Extract (a: 5) and (b: 9). Create internal node (14) with children a and b.
Heap: [c:12, d:13, internal:14, e:16, f:45]
Step 2: Extract (c: 12) and (d: 13). Create internal node (25) with children c and d.
Heap: [internal:14, e:16, internal:25, f:45]
Step 3: Extract (internal: 14) and (e: 16). Create internal node (30).
Heap: [internal:25, internal:30, f:45]
Step 4: Extract (internal: 25) and (internal: 30). Create internal node (55).
Heap: [f:45, internal:55]
Step 5: Extract (f: 45) and (internal: 55). Create internal node (100).
Heap: [internal:100] (Root)Traversing the Tree
Starting from the root, we assign 0 for moving to the left child and 1 for moving to the right child.
Character Huffman Code
f 0
c 100
d 101
a 1100
b 1101
e 111JavaScript Code Implementation
// A Huffman Tree node
class HuffmanNode {
constructor(char, freq) {
this.char = char;
this.freq = freq;
this.left = null;
this.right = null;
}
}
// Simple Min-Heap implementation for Priority Queue
class MinHeap {
constructor() {
this.data = [];
}
push(node) {
this.data.push(node);
this.upHeap(this.data.length - 1);
}
pop() {
if (this.data.length === 0) return null;
const top = this.data[0];
const bottom = this.data.pop();
if (this.data.length > 0) {
this.data[0] = bottom;
this.downHeap(0);
}
return top;
}
size() {
return this.data.length;
}
upHeap(i) {
while (i > 0) {
const p = Math.floor((i - 1) / 2);
if (this.data[i].freq >= this.data[p].freq) break;
this.swap(i, p);
i = p;
}
}
downHeap(i) {
const len = this.data.length;
while (2 * i + 1 < len) {
let left = 2 * i + 1;
let right = left + 1;
let smallest = i;
if (this.data[left].freq < this.data[smallest].freq) {
smallest = left;
}
if (right < len && this.data[right].freq < this.data[smallest].freq) {
smallest = right;
}
if (smallest === i) break;
this.swap(i, smallest);
i = smallest;
}
}
swap(i, j) {
const temp = this.data[i];
this.data[i] = this.data[j];
this.data[j] = temp;
}
}
// Function to print codes from the Huffman Tree
function printCodes(root, code, result) {
if (!root) return;
// If it's a leaf node, print the character and code
if (!root.left && !root.right) {
result[root.char] = code;
return;
}
printCodes(root.left, code + "0", result);
printCodes(root.right, code + "1", result);
}
// Main Huffman Coding driver function
function buildHuffmanTree(chars, freqs) {
const minHeap = new MinHeap();
// 1. Create a leaf node for each character and push to heap
for (let i = 0; i < chars.length; i++) {
minHeap.push(new HuffmanNode(chars[i], freqs[i]));
}
// 2. Combine nodes until only one remains
while (minHeap.size() > 1) {
const left = minHeap.pop();
const right = minHeap.pop();
// Create internal node with combined frequency
const internal = new HuffmanNode(null, left.freq + right.freq);
internal.left = left;
internal.right = right;
minHeap.push(internal);
}
// 3. Print codes starting from root
const root = minHeap.pop();
const codes = {};
printCodes(root, "", codes);
return codes;
}
// Driver code matching the walkthrough example
const chars = ["a", "b", "c", "d", "e", "f"];
const freqs = [5, 9, 12, 13, 16, 45];
const huffmanCodes = buildHuffmanTree(chars, freqs);
console.log("Character \t Huffman Code");
Object.keys(huffmanCodes).sort().forEach(char => {
console.log(`${char} \t\t ${huffmanCodes[char]}`);
});
/*
Output:
a 1100
b 1101
c 100
d 101
e 111
f 0
*/Complexity Analysis
- Time Complexity: O(N log N)
- Inserting $N$ unique nodes into the min-heap takes $O(N \log N)$ time.
- The loop combining nodes runs $N-1$ times, each taking $O(\log N)$ heap operations.
- Space Complexity: O(N)
- To store tree nodes and heap structures.