What is Subset Load Balancing?

Designing on a larger scale: Distributed Hashing

As the name itself suggests, subset load balancing partitions the system of available nodes into multiple subsets and distributes the workload among smaller subsets of resources. This is required as it helps the system to handle more traffic, reduce response times, and increase the reliability and fault tolerance of the system. Thus, using subsets, enhances resource availability and scalability as well, by reducing overall latency.

The key concept pillars related to Subsetting in Load Balancing are:

Partitioning: Partitioning involves breaking down the data or workload into subsets. Partitioning can be done in various ways, including hash-based partitioning, range-based partitioning, and list-based partitioning.
Load Balancing or Distribution of Traffic: It involves assigning the subsets to different nodes in the system to distribute the workload evenly. Load balancing can be achieved using various algorithms, including round-robin, weighted round-robin, least connections, and IP hash.
Failover: Failover involves ensuring that if one node in the system fails, the workload assigned to that node is transferred to another node in the system. Failover can be achieved using various techniques, including active-passive failover, active-active failover, and hot standby.
Monitoring: Monitoring involves tracking the performance of the nodes in the system and taking corrective action if necessary. Monitoring can be achieved using various tools, including Nagios, Zabbix, and Prometheus.

How does Hashing help in Subset Load Balancing?

Hashing is a technique or process of mapping keys and values into the hash table by using a hash function. It is done for faster access to elements. The efficiency of mapping depends on the efficiency of the hash function used.

A hash function is described as a function that maps one piece of data as in a structure or object, to a different kind of long integer value(eg: SHA256), which is considered as the generated hash code. One possible way to implement hashing is using Hash Tables or Hash Maps.

Hash Tables

To build such a hash table, we need to build an array for all possible indices, but it would be practically impossible as the output range of a good hash function would be in the range of 32 or 64 bits. To overcome this, we need to have a reasonably sized array, like,

index = hash_func(object) % N

Secondly, another problem that we may face is this object hashes will not be unique, and there would be many such collisions, and therefore simple direct index will not work. Ways to handle this would be to assign a bucket of values for each index. Thus, to add a new object, we need to calculate its index, and we need to check if it already exists, if not, add it. Thus, with this structure, although the searches within buckets are linear, a properly sized hash table should have a reasonably small number of objects per bucket, which would eventually result in almost constant time access ~ O(N/K), where K is the number of buckets and N is the total indexes in the array.

Load Balancing through Subsets in Distributed System

Before diving into what is subsetting in load balancing, we should first understand what is load balancing, and why subsetting is all the more important in load balancing.

Load balancing is the process of distributing incoming network traffic/workload across multiple servers or nodes in a network system. The main aim of load balancing is to optimize resource utilization, maximize throughput and minimize response time (overload) on any single server or resource.