Designing on a larger scale: Distributed Hashing

Consistent Hashing – A Complete Solution:

Scaling out is a technique that involves adding more nodes to the system to increase its capacity.

Distributed hashing is a load-balancing technique that involves partitioning the data based on its hash value. Sometimes it is necessary or desirable to split a hash table into several parts, hosted by different servers. Each node in the system is responsible for a range of hash values, and the data with the corresponding hash value is assigned to that node. One reason to do such is to bypass the memory limitations in a single computer, thus giving way for the construction of arbitrarily large hash tables, which will go hand-in-hand with enough servers.

Example:

Here is an example of distributed hashing with proper tables:

Suppose we have four nodes or servers in our system and want to partition the data based on its hash value. We can use the following table to map the hash values to the nodes:

Node Range of Hash Values:

1 0 – 25

2 26 – 50

3 51 – 70

4 76 – 100

Suppose we have a data item with a hash value of 35. According to the table, this data item should be assigned to node 2. Similarly, a data item with a hash value of 85 should be assigned to node 4.

Distributed hashing with proper tables ensures that the workload is distributed evenly across all the nodes in the system. It also ensures that each node is responsible for a specific range of hash values, which makes it easier to manage the system.

Why Distributed Hashing fail in case of a variable number of servers?

Distributed hashing seems easy to implement and intuitive and works quite well until the number of servers changes. Suppose, one of the servers becomes unavailable or crashes or maybe we decide to add another server. Thus the hash distribution would change then, for the change in the number of nodes. This may very well lead to degrading performance.

Load Balancing through Subsets in Distributed System

Before diving into what is subsetting in load balancing, we should first understand what is load balancing, and why subsetting is all the more important in load balancing.

Load balancing is the process of distributing incoming network traffic/workload across multiple servers or nodes in a network system. The main aim of load balancing is to optimize resource utilization, maximize throughput and minimize response time (overload) on any single server or resource.

Tags:

#Distributed System #System Design

What is Subset Load Balancing?

Consistent Hashing – A Complete Solution:

Designing on a larger scale: Distributed Hashing

Why Distributed Hashing fail in case of a variable number of servers?

Load Balancing through Subsets in Distributed System

Similar Reads