What is the Huffman Algorithm
David Albert Huffman (August 9, 1925 — October 7, 1999) was an American computer scientist who developed the Huffman coding system. He was a pioneer in the field of mathematical origami as well. He went on to join MIT, where he invented a very clever way of compressing code and it went to etched in history as the Huffman Coding. David Huffman published it when he was a ScD student at MIT in 1952. He has other accolades that he is proud of such as “The Synthesis of Sequential Switching Circuits”. He is forever remembered for his invention and has a special place in the Data Structures and Analysis world. While many might wonder what’s the specialty that this algorithm holds, lets take a closer look into it.
Huffman Coding is a data compression technique that reduces the size of data without sacrificing any of the features. David Huffman cleverly came up with a way to do this. Therefore, Huffman Coding is commonly used to compress data that contains frequently recurring characters.
To understand the specialty of this lets take an example to simplify it in layman terms.
Taking the example of the above code, we can clearly notice that there are a recurrence of many alphabets consistently throughout the given code. Such as 2 A’s and 3 D’s and so on. To reduce this recurrence size we use the huffman coding. Each character is 8 bits long. The above string contains a total of 15 characters. This string requires a total of 8 * 15 = 120 bits to convey.
We can compress the string to a smaller size using the Huffman Coding technique. Huffman coding forms a tree utilizing the character’s frequencies before generating code for each character. Data must be decoded once it has been encoded. The same tree is used for decoding.
At first we calculate the amount of alphabets present in total, so in this particular case we have:
Once we calculate the frequency we simply have to arrange it in ascending order to proceed ahead according to Huffman’s Method. We also call this as Priority Queue in the terms of DAA.
As soon as we arrange it in ascending order, we have to sum the two lowest values, which in this case are 1,3 and create a new node above the two lowest nodes. So it adds up to 4, after that, connect the next lowest node to which is 5 to 4. Just as we did earlier, we add the two values and create a new node. So in this particular case, it sums up to 9. Now the only node left that needs to be added is 6. So, we connect 9 to 6 and add them together. Where it is 15.
Now, 15 is the head node, on each branch on the left we assign 0, and on the right hand side we assign 1. Now to get the reassigned code, we simply check code on the branches leading to the specific alphabets in each case. By doing this we get;
A= 11; 5*2=10
Now finally we simply multiply the frequency with the code and add it to get the total downsized bits.
After adding we get a total sum of 28 Bits.