Double Hashing in Python

Double hashing is a collision resolution technique used in hash tables. It works by using two hash functions to compute two different hash values for a given key. The first hash function is used to compute the initial hash value, and the second hash function is used to compute the step size for the probing sequence. In this article, we’ll explore what double hashing actually is and its implementation using Python.

What is Double Hashing?

Double hashing is a collision resolution technique that involves using two hash functions to calculate the index where a data item should be placed in a hash table. When a collision occurs (i.e., when two items map to the same index), the second hash function is applied iteratively until an empty slot is found. This process ensures that items are distributed more evenly throughout the hash table, reducing the likelihood of collisions and improving overall performance.

Implementation in Python:

Below is the implementation of Double Hashing in Python:

Python
from typing import List
import math

MAX_SIZE = 10000001


class DoubleHash:
    def __init__(self, n: int):
        self.TABLE_SIZE = n
        self.PRIME = self.__get_largest_prime(n - 1)
        self.keysPresent = 0
        self.hashTable = [-1] * n

    def __get_largest_prime(self, limit: int) -> int:
        is_prime = [True] * (limit + 1)
        is_prime[0], is_prime[1] = False, False
        for i in range(2, int(math.sqrt(limit)) + 1):
            if is_prime[i]:
                for j in range(i * i, limit + 1, i):
                    is_prime[j] = False
        for i in range(limit, -1, -1):
            if is_prime[i]:
                return i

    def __hash1(self, value: int) -> int:
        return value % self.TABLE_SIZE

    def __hash2(self, value: int) -> int:
        return self.PRIME - (value % self.PRIME)

    def is_full(self) -> bool:
        return self.TABLE_SIZE == self.keysPresent

    def insert(self, value: int) -> None:
        if value == -1 or value == -2:
            print("ERROR : -1 and -2 can't be inserted in the table")
            return
        if self.is_full():
            print("ERROR : Hash Table Full")
            return
        probe, offset = self.__hash1(value), self.__hash2(value)
        while self.hashTable[probe] != -1:
            if -2 == self.hashTable[probe]:
                break
            probe = (probe + offset) % self.TABLE_SIZE
        self.hashTable[probe] = value
        self.keysPresent += 1

    def erase(self, value: int) -> None:
        if not self.search(value):
            return
        probe, offset = self.__hash1(value), self.__hash2(value)
        while self.hashTable[probe] != -1:
            if self.hashTable[probe] == value:
                self.hashTable[probe] = -2
                self.keysPresent -= 1
                return
            else:
                probe = (probe + offset) % self.TABLE_SIZE

    def search(self, value: int) -> bool:
        probe, offset, initialPos, firstItr = self.__hash1(
            value), self.__hash2(value), self.__hash1(value), True
        while True:
            if self.hashTable[probe] == -1:
                break
            elif self.hashTable[probe] == value:
                return True
            elif probe == initialPos and not firstItr:
                return False
            else:
                probe = (probe + offset) % self.TABLE_SIZE
            firstItr = False
        return False

    def print(self) -> None:
        print(*self.hashTable, sep=', ')


if __name__ == '__main__':
    myHash = DoubleHash(13)

    # Inserts random element in the hash table
    insertions = [115, 12, 87, 66, 123]
    for insertion in insertions:
        myHash.insert(insertion)
    print("Status of hash table after initial insertions : ", end="")
    myHash.print()

    # Searches for random element in the hash table, and prints them if found.
    queries = [1, 12, 2, 3, 69, 88, 115]
    n2 = len(queries)
    print("\nSearch operation after insertion : ")

    for i in range(n2):
        if myHash.search(queries[i]):
            print(queries[i], "present")

    # Deletes random element from the hash table.
    deletions = [123, 87, 66]
    n3 = len(deletions)

    for i in range(n3):
        myHash.erase(deletions[i])

    print("Status of hash table after deleting elements : ", end='')
    myHash.print()

Output
Status of hash table after initial insertions : -1, 66, -1, -1, -1, -1, 123, -1, -1, 87, -1, 115, 12

Search operation after insertion : 
12 present
115 present
Status of hash table after deleting ele...

Benefits of Double Hashing:

  • Reduced Collisions: Double hashing distributes items more evenly throughout the hash table, reducing the likelihood of collisions.
  • Improved Performance: By resolving collisions efficiently, double hashing ensures faster data retrieval and manipulation.
  • Simplicity: Double hashing is relatively easy to implement and understand compared to other collision resolution techniques like chaining or open addressing

By distributing items more evenly than techniques such as linear probing, double hashing can improve the performance of hash tables significantly and reduce collisions as well. In Python, the implementation of double hashing is plain and simple and can make a big difference in scenarios where efficient data storage and retrieval are essential. By learning and using double hashing, programmers can build more substantial and extensive applications.