Data Structure and Algorithm
Data Structure
Hashing
Double Hashing

Double Hashing

Double hashing is a collision resolution technique used in open-addressed hash tables. It works by using two hash functions to compute two different hash values for a given key. The first hash function is used to compute the initial hash value, and the second hash function is used to compute the step size for the probing sequence.

Double hashing has the ability to have a low collision rate, as it uses two hash functions to compute the hash value and the step size. This means that the probability of a collision occurring is lower than in other collision resolution techniques such as linear probing or quadratic probing.

However, double hashing has a few drawbacks. First, it requires the use of two hash functions, which can increase the computational complexity of the insertion and search operations. Second, it requires a good choice of hash functions to achieve good performance. If the hash functions are not well-designed, the collision rate may still be high.

Advantages of Double Hashing

  • The advantage of Double hashing is that it is one of the best forms of probing, producing a uniform distribution of records throughout a hash table.
  • This technique does not yield any clusters (unlike linear probing which suffers from primary clustering).
  • It is one of the effective methods for resolving collisions.

How Double Hashing Works

Double hashing can be done using the following formula:

hash(key, i) = (hash1(key) + i * hash2(key)) % TABLE_SIZE

Here hash1() and hash2() are hash functions and TABLE_SIZE is the size of the hash table. (We repeat by increasing i (0, 1, 2...) when a collision occurs)

  • The first hash function is typically: hash1(key) = key % TABLE_SIZE
  • A popular second hash function is: hash2(key) = PRIME - (key % PRIME) (where PRIME is a prime number smaller than the TABLE_SIZE)

Requirements for a good second Hash function:

  1. It must never evaluate to zero (otherwise the step size would be zero, resulting in an infinite loop on the same slot).
  2. It must make sure that all cells can be probed.

Example

Let's say TABLE_SIZE = 13.

  • Hash1(key) = key % 13
  • Hash2(key) = 7 - (key % 7)

If we want to insert key 10, and Hash1(10) results in a collision:

  • Probe 1 (i = 1): (Hash1(10) + 1 * Hash2(10)) % 13
  • Probe 2 (i = 2): (Hash1(10) + 2 * Hash2(10)) % 13 ...and so on until an empty slot is found.