Huffman Coding

Kruskal’s Minimum Spanning Tree Algorithm

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters and lengths of the assigned codes are based on the frequencies of corresponding characters.

The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.

Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.

There are mainly two major parts in Huffman Coding:

Build a Huffman Tree from input characters.
Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree:

Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.

Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
Extract two nodes with the minimum frequency from the min heap.
Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
Let us understand the algorithm with an example:

character   Frequency
    a            5
    b           9
    c           12
    d           13
    e           16
    f           45

After Creating Huffman Tree for these characters, our tree will look like below:

Huffman Tree

Huffman codes for below characters will look like this:

character     code-word
    f                        0
    c                       100
    d                       101
    a                       1100
    b                       1101
    e                        111

Application of Huffman Coding:

Below are some of the applications of Huffman coding in the real life

Lossless Image Compression
Using Structures in Images
Text Compression
Audio Compression

Fractional Knapsack Problem

We are given a set of items, each with a weight and a value, and we want to find the most valuable subset of items that we can fit into a knapsack with capacity W. The catch is that we can take fractional amounts of each item, so an item can be present in fractional form.

Problem Statement: Given the weights and profits of N items, in the form of {profit, weight} put these items in a knapsack of capacity W to get the maximum total profit in the knapsack. We are allowed to take fractional values in the knapsack.

Example:

Input: arr[] = {{60, 10}, {100, 20}, {120, 30}}, W = 50
Output: 240
Explanation: By taking items of weight 10 and 20 kg and 2/3 fraction of 30 kg.
Hence total price will be 60+100+(2/3)(120) = 240

Approach: An efficient solution is to use the Greedy approach.

The basic idea of the greedy approach is to calculate the ratio profit/weight for each item and sort the item on the basis of this ratio. Then take the item with the highest ratio and add them as much as we can (can be the whole element or a fraction of it).

This will always give the maximum profit because, in each step it adds an element such that this is the maximum possible profit for that much weight.

Time Complexity: O(N * logN)
Auxiliary Space: O(N)

Optimal File Merge Patterns

Optimal merge pattern is a pattern that relates to the merging of two or more sorted files in a single sorted file. here, we have two sorted files containing n and m records respectively then they could be merged together, to obtain one sorted file in time O(n+m).

Problem Statement: Given n number of sorted files, the task is to find the minimum computations done to reach the Optimal Merge Pattern.
When two or more sorted files are to be merged altogether to form a single file, the minimum computations are done to reach this file are known as Optimal Merge Pattern.

If more than 2 files need to be merged then it can be done in pairs. For example, if need to merge 4 files A, B, C, D. First Merge A with B to get X1, merge X1 with C to get X2, merge X2 with D to get X3 as the output file.

If we have two files of sizes m and n, the total computation time will be m+n. Here, we use the greedy strategy by merging the two smallest size files among all the files present.

Example:

Input: n = 6, size = {2, 3, 4, 5, 6, 7}
Output: 68
Explanation: Optimal way to combine these files

Approach:

Node represents a file with a given size also given nodes are greater than 2

Add all the nodes in a priority queue (Min Heap).{pq.poll = file size}

Initialize count = 0 // variable to store file computations.

Repeat while (size of priority Queue is greater than 1)

int weight = pq.poll(); pq.pop;//pq denotes priority queue, remove 1st smallest and pop(remove) it out

weight+=pq.poll() && pq.pop(); // add the second element and then pop(remove) it out

count +=weight;

pq.add(weight) // add this combined cost to priority queue;

count is the final answer

Time Complexity: O(nlogn)
Auxiliary Space: O(n)

Prim’s Algorithm for Minimum Spanning Tree (MST)

It is a greedy algorithm that is used to find the MST from a graph. Prim’s algorithm finds the subset of edges that includes every vertex of the graph such that the sum of the weights of the edges is minimized.

Prim’s algorithm starts with the single node and explores all the adjacent nodes with all the connecting edges at each step. The edges with the minimal weights causing no cycles in the graph got selected.

How does Prim’s Algorithm Work?

The working of Prim’s algorithm can be described by using the following steps:

Step 1: Determine an arbitrary vertex as the starting vertex of the MST.
Step 2: Follow steps 3 to 5 till there are vertices that are not included in the MST (known as fringe vertex).
Step 3: Find edges connecting any tree vertex with the fringe vertices.
Step 4: Find the minimum among these edges.
Step 5: Add the chosen edge to the MST if it does not form any cycle.
Step 6: Return the MST and exit

Time Complexity: O(V²), If the input graph is represented using an adjacency list, then the time complexity of Prim’s algorithm can be reduced to O(E * logV) with the help of a binary heap. In this implementation, we are always considering the spanning tree to start from the root of the graph
Auxiliary Space: O(V)

Greedy Algorithm Notes for GATE Exam [2024]Fractional Knapsack ProblemOptimal File Merge PatternsPrim’s Algorithm for Minimum Spanning Tree (MST)

In the dynamic landscape of algorithmic design, Greedy Algorithms stand out as powerful tools for solving optimization problems. Aspirants preparing for the GATE Exam 2024 are poised to encounter a range of questions that test their understanding of Greedy Algorithms. These notes aim to provide a concise and insightful overview, unraveling the principles and applications of Greedy Algorithms that are likely to be scrutinized in the upcoming GATE examination.

Table of Content

Introduction to Greedy Algorithms:
Activity Selection Problem:
Job Sequencing Problem
Huffman Coding
Kruskal’s Minimum Spanning Tree Algorithm
Dijkstra’s shortest path algorithm
MCQ Questions for Greedy Algorithm