Designing on a larger scale: Distributed Hashing

Scaling out is a technique that involves adding more nodes to the system to increase its capacity. 

Distributed hashing is a load-balancing technique that involves partitioning the data based on its hash value. Sometimes it is necessary or desirable to split a hash table into several parts, hosted by different servers. Each node in the system is responsible for a range of hash values, and the data with the corresponding hash value is assigned to that node. One reason to do such is to bypass the memory limitations in a single computer, thus giving way for the construction of arbitrarily large hash tables, which will go hand-in-hand with enough servers.

Example:

Here is an example of distributed hashing with proper tables:

Suppose we have four nodes or servers in our system and want to partition the data based on its hash value. We can use the following table to map the hash values to the nodes:

Node Range of Hash Values:

1 0 – 25
2 26 – 50
3 51 – 70
76 – 100

Suppose we have a data item with a hash value of 35. According to the table, this data item should be assigned to node 2. Similarly, a data item with a hash value of 85 should be assigned to node 4.

Distributed hashing with proper tables ensures that the workload is distributed evenly across all the nodes in the system. It also ensures that each node is responsible for a specific range of hash values, which makes it easier to manage the system.

Why Distributed Hashing fail in case of a variable number of servers?

Distributed hashing seems easy to implement and intuitive and works quite well until the number of servers changes. Suppose, one of the servers becomes unavailable or crashes or maybe we decide to add another server. Thus the hash distribution would change then, for the change in the number of nodes. This may very well lead to degrading performance.

Load Balancing through Subsets in Distributed System

Before diving into what is subsetting in load balancing, we should first understand what is load balancing, and why subsetting is all the more important in load balancing.

Load balancing is the process of distributing incoming network traffic/workload across multiple servers or nodes in a network system. The main aim of load balancing is to optimize resource utilization, maximize throughput and minimize response time (overload) on any single server or resource.

Similar Reads

What is Subset Load Balancing?

As the name itself suggests, subset load balancing partitions the system of available nodes into multiple subsets and distributes the workload among smaller subsets of resources. This is required as it helps the system to handle more traffic, reduce response times, and increase the reliability and fault tolerance of the system. Thus, using subsets, enhances resource availability and scalability as well, by reducing overall latency....

Designing on a larger scale: Distributed Hashing

Scaling out is a technique that involves adding more nodes to the system to increase its capacity....

Consistent Hashing – A Complete Solution:

One distribution scheme which doesn’t depend on the number of servers is Consistent Hashing....