Solves the optimization problem to compress a file into the minimum number of bits
A form of lossless compression
Does not compress when every character has the same weight
Greedy algorithm because the most frequently used characters get the shortest encoding
Structure
A full Binary Tree where each leaf corresponds to a character
We collect the frequency of different characters and build a tree whose paths represents characters
Characters with larger frequencies have shorter distances from the root
Branching left or right from a given node represents 0 or 1 respectively
The path taken to arrive at a character’s leaf is the Huffman encoding for that character
Algorithm
Create a table with symbols and their frequencies
Construct a set of trees with root nodes that contain each of the individual symbols and their weight (frequency)
Place the set of trees into (min) priority queue
while the priority queue has more than one item
Remove the two trees with the smallest weights
Combine them into a new binary tree in which the weight of the tree root is the sum of the weights of its children
Insert the newly created tree back into the priority queue
Traverse the resulting tree to obtain binary codes for characters (append 0 to code when towards left and 1 when towards right)
Procedure Huffman(C): n = C.size Q = priority_queue() for i = 1 to n x = node(C[i]) Q.push(x) end for while Q.size() is greater than 1 Z = new node() Z.left = x = Q.pop Z.right = y = Q.pop Z.freq = x.freq + y.freq Q.push(Z) end while Return Q.top()