Hashing Functions

Hashing is the process of generating a value from a text or a list of numbers using a mathematical function known as a hash function.

A Hash Function is a function that converts a given numeric or alphanumeric key to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a significant number or string to a small integer that can be used as the index in the hash table.

A good hash function should have the following properties:

It should be efficiently computable.
It should uniformly distribute the keys (each table position should be equally likely for each key).

Types of Hash Functions

There are many hash functions that use numeric or alphanumeric keys. This article focuses on discussing different hash functions:

Division Method.
Mid Square Method.
Folding Method.
Multiplication Method.

Let's discuss these methods in detail:

1. Division Method

This is the most simple and easiest method to generate a hash value. The hash function divides the value k by M and then uses the remainder obtained.

Formula: h(K) = K mod M

Where:

K is the key value
M is the size of the hash table

It is best suited that M is a prime number as that can make sure the keys are more uniformly distributed. The hash function is dependent upon the remainder of a division.

Example:

K = 12345
M = 95

h(12345) = 12345 mod 95
         = 90

Pros:

This method is quite good for any value of M.
The division method is very fast since it requires only a single division operation.

Cons:

This method leads to poor performance since consecutive keys map to consecutive hash values in the hash table.
Sometimes extra care should be taken to choose the value of M.

2. Mid Square Method

The mid-square method is a very good hashing method. It involves two steps to compute the hash value:

Square the value of the key k i.e. k²
Extract the middle r digits as the hash value.

Formula: h(K) = h(K x K)

Example: Suppose the hash table has 100 memory locations (M = 100). This means r = 2 because two digits are required to map the key to the memory location.

K = 50
K x K = 50 x 50
      = 2500

We extract the middle two digits:
h(50) = 50

Pros:

The performance of this method is good as most or all digits of the key value contribute to the result. This is because all digits in the key contribute to generating the middle digits of the squared result.
The result is not dominated by the distribution of the top digit or bottom digit of the original key value.

Cons:

The size of the key is one of the limitations of this method, as if the key is a big digit then its square will double the number of digits.
Another disadvantage is that there will still be collisions, so we cannot eliminate collisions.

3. Digit Folding Method

This method involves two steps:

Divide the key-value K into n number of parts i.e. k1, k2, k3...kn, where each part has the same number of digits except for the last part that can have lesser digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if any.

Formula:

K = k1, k2, k3, k4, ... kn
s = k1 + k2 + k3 + k4 + ... + kn
h(K) = s

Example:

K = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
s = 12 + 34 + 5
s = 51

h(K) = 51

Note: The number of digits in each part varies depending upon the size of the hash table. For example, if the size of the hash table is 100, then each part must have two digits except for the last part which can have a lesser number of digits.

4. Multiplication Method

This method involves the following steps:

Choose a constant value A such that 0 < A < 1.
Multiply the key value with A.
Extract the fractional part of kA.
Multiply the result of the above step by the size of the hash table i.e. M.
The resulting hash value is obtained by taking the floor of the result obtained in step 4.

Formula: h(K) = floor (M (kA mod 1))

Where:

M is the size of the hash table.
K is the key value.
A is a constant value.

Example:

K = 1234
A = 0.618033
M = 100

h(1234) = floor (100 (1234 * 0.618033 mod 1))
        = floor (100 (762.652722 mod 1))
        = floor (100 (0.652722))
        = floor (65.2722)
        = 65

Pros: The advantage of the multiplication method is that it can work well with any value of M (even if M is a power of 2), although it is recommended that M is a power of 2 for faster calculation.

Cons: The multiplication method is generally slower when the table size is not a power of two. However, the whole process of computing the index by the key using multiplication hashing is generally very fast.

Direct Address Table Collision Handling