Huffman Trees

Solves the optimization problem to compress a file into the minimum number of bits
A form of lossless compression
Does not compress when every character has the same weight
Greedy algorithm because the most frequently used characters get the shortest encoding

Structure

A full Binary Tree where each leaf corresponds to a character
We collect the frequency of different characters and build a tree whose paths represents characters
- Characters with larger frequencies have shorter distances from the root
- Branching left or right from a given node represents 0 or 1 respectively
- The path taken to arrive at a character’s leaf is the Huffman encoding for that character

Algorithm

Create a table with symbols and their frequencies
Construct a set of trees with root nodes that contain each of the individual symbols and their weight (frequency)
Place the set of trees into (min) priority queue
while the priority queue has more than one item
1. Remove the two trees with the smallest weights
2. Combine them into a new binary tree in which the weight of the tree root is the sum of the weights of its children
3. Insert the newly created tree back into the priority queue
Traverse the resulting tree to obtain binary codes for characters (append 0 to code when towards left and 1 when towards right)

Procedure Huffman(C):
	n = C.size
	Q = priority_queue()
 
	for i = 1 to n  
		x = node(C[i])
		Q.push(x)
	end for
 
	while Q.size() is greater than 1
		Z = new node()
		Z.left = x = Q.pop
		Z.right = y = Q.pop
		Z.freq = x.freq + y.freq Q.push(Z)
	end while
	
	Return Q.top()

Liam's UF Notes

Explorer

Huffman Trees

Structure

Algorithm

Graph View

Table of Contents

Backlinks

Explorer