Huffman Coding

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters and lengths of the assigned codes are based on the frequencies of corresponding characters. 

The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.

Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”. 

There are mainly two major parts in Huffman Coding:

  1. Build a Huffman Tree from input characters.
  2. Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree:

Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree. 

  1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
  2. Extract two nodes with the minimum frequency from the min heap.
     
  3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
  4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
    Let us understand the algorithm with an example:
character   Frequency
a 5
b 9
c 12
d 13
e 16
f 45

After Creating Huffman Tree for these characters, our tree will look like below:

Huffman Tree

Huffman codes for below characters will look like this:

character     code-word
f 0
c 100
d 101
a 1100
b 1101
e 111

Application of Huffman Coding:

Below are some of the applications of Huffman coding in the real life

  • Lossless Image Compression
  • Using Structures in Images
  • Text Compression
  • Audio Compression

We are given a set of items, each with a weight and a value, and we want to find the most valuable subset of items that we can fit into a knapsack with capacity W. The catch is that we can take fractional amounts of each item, so an item can be present in fractional form.

Problem Statement: Given the weights and profits of N items, in the form of {profit, weight} put these items in a knapsack of capacity W to get the maximum total profit in the knapsack. We are allowed to take fractional values in the knapsack.

Example:

Input: arr[] = {{60, 10}, {100, 20}, {120, 30}}, W = 50
Output: 240 
Explanation: By taking items of weight 10 and 20 kg and 2/3 fraction of 30 kg. 
Hence total price will be 60+100+(2/3)(120) = 240

Approach: An efficient solution is to use the Greedy approach. 

The basic idea of the greedy approach is to calculate the ratio profit/weight for each item and sort the item on the basis of this ratio. Then take the item with the highest ratio and add them as much as we can (can be the whole element or a fraction of it).

This will always give the maximum profit because, in each step it adds an element such that this is the maximum possible profit for that much weight.

Time Complexity: O(N * logN)
Auxiliary Space: O(N)

Optimal merge pattern is a pattern that relates to the merging of two or more sorted files in a single sorted file. here, we have two sorted files containing n and m records respectively then they could be merged together, to obtain one sorted file in time O(n+m).

Problem Statement: Given n number of sorted files, the task is to find the minimum computations done to reach the Optimal Merge Pattern. 
When two or more sorted files are to be merged altogether to form a single file, the minimum computations are done to reach this file are known as Optimal Merge Pattern.

If more than 2 files need to be merged then it can be done in pairs. For example, if need to merge 4 files A, B, C, D. First Merge A with B to get X1, merge X1 with C to get X2, merge X2 with D to get X3 as the output file.

If we have two files of sizes m and n, the total computation time will be m+n. Here, we use the greedy strategy by merging the two smallest size files among all the files present.

Example:

Input: n = 6, size = {2, 3, 4, 5, 6, 7} 
Output: 68 
Explanation: Optimal way to combine these files 

Approach: 

Node represents a file with a given size also given nodes are greater than 2 

  1. Add all the nodes in a priority queue (Min Heap).{pq.poll = file size}
  2. Initialize count = 0 // variable to store file computations.
  3. Repeat while (size of priority Queue is greater than 1) 
    1. int weight = pq.poll(); pq.pop;//pq denotes priority queue, remove 1st smallest and pop(remove) it out
    2. weight+=pq.poll()  && pq.pop(); // add the second element and then pop(remove) it out
    3. count +=weight;
    4. pq.add(weight) // add this combined cost to priority queue;  
  4. count is the final answer

Time Complexity: O(nlogn)
Auxiliary Space: O(n)

It is a greedy algorithm that is used to find the MST from a graph. Prim’s algorithm finds the subset of edges that includes every vertex of the graph such that the sum of the weights of the edges is minimized.

Prim’s algorithm starts with the single node and explores all the adjacent nodes with all the connecting edges at each step. The edges with the minimal weights causing no cycles in the graph got selected.

How does Prim’s Algorithm Work? 

The working of Prim’s algorithm can be described by using the following steps:

Step 1: Determine an arbitrary vertex as the starting vertex of the MST.
Step 2: Follow steps 3 to 5 till there are vertices that are not included in the MST (known as fringe vertex).
Step 3: Find edges connecting any tree vertex with the fringe vertices.
Step 4: Find the minimum among these edges.
Step 5: Add the chosen edge to the MST if it does not form any cycle.
Step 6: Return the MST and exit

Time Complexity: O(V2), If the input graph is represented using an adjacency list, then the time complexity of Prim’s algorithm can be reduced to O(E * logV) with the help of a binary heap.  In this implementation, we are always considering the spanning tree to start from the root of the graph
Auxiliary Space: O(V)

Greedy Algorithm Notes for GATE Exam [2024]Fractional Knapsack ProblemOptimal File Merge PatternsPrim’s Algorithm for Minimum Spanning Tree (MST)

In the dynamic landscape of algorithmic design, Greedy Algorithms stand out as powerful tools for solving optimization problems. Aspirants preparing for the GATE Exam 2024 are poised to encounter a range of questions that test their understanding of Greedy Algorithms. These notes aim to provide a concise and insightful overview, unraveling the principles and applications of Greedy Algorithms that are likely to be scrutinized in the upcoming GATE examination.

Table of Content

  • Introduction to Greedy Algorithms:
  • Activity Selection Problem:
  • Job Sequencing Problem
  • Huffman Coding
  • Kruskal’s Minimum Spanning Tree Algorithm
  • Dijkstra’s shortest path algorithm
  • MCQ Questions for Greedy Algorithm

Similar Reads

Introduction to Greedy Algorithms:

What is Greedy Algorithm?...

Activity Selection Problem:

The activity selection​ problem is an optimization problem used to find the maximum number of activities a person can perform if they can only work on one activity at a time....

Job Sequencing Problem

The job sequencing problem states that We have a single processor operating system and a set of jobs that have to be completed with given deadline constraints. Our objective is to maximize the profit, given the condition that only one job can be completed at a given time....

Huffman Coding

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters and lengths of the assigned codes are based on the frequencies of corresponding characters....

Kruskal’s Minimum Spanning Tree Algorithm

Kruskal’s algorithm is a well-known algorithm for finding the minimum spanning tree of a graph. It is a greedy algorithm that makes use of the fact that the edges of a minimum spanning tree must form a subset of the edges of any other spanning tree. In Kruskal’s algorithm, sort all edges of the given graph in increasing order. Then it keeps on adding new edges and nodes in the MST if the newly added edge does not form a cycle. It picks the minimum weighted edge at first and the maximum weighted edge at last. Thus we can say that it makes a locally optimal choice in each step in order to find the optimal solution. Hence this is a Greedy Algorithm....

Dijkstra’s shortest path algorithm

Dijkstra’s algorithm is a popular algorithms for solving many single-source shortest path problems having non-negative edge weight in the graphs i.e., it is to find the shortest distance between two vertices on a graph. It was conceived by Dutch computer scientist Edsger W. Dijkstra in 1956. The algorithm maintains a set of visited vertices and a set of unvisited vertices. It starts at the source vertex and iteratively selects the unvisited vertex with the smallest tentative distance from the source. It then visits the neighbours of this vertex and updates their tentative distances if a shorter path is found. This process continues until the destination vertex is reached, or all reachable vertices have been visited....

Previously Asked GATE Questions for Greedy Algorithm

Question 1: Suppose the letters a, b, c, d, e, f have probabilities 1/2, 1/4, 1/8, 1/16, 1/32, 1/32 respectively. Which of the following is the Huffman code for the letter a, b, c, d, e, f? GATE-2007...