We will find out what are Hash functions, Hashcode, Collision and Rehashing.
Since HashMap and HashSet both use Hashing Technique we will also be able to answer
- Internal working of HashMap
- Internal working of HashSet
Hash functions are used in hash tables, to quickly locate a data record (e.g., a dictionary definition) given its search key.
A hash function has the following properties
- it always returns a number for an object.
- two equal objects will always have the same number
- two unequal objects do not always have different numbers
The procedure of storing objects using a hash function
- Create an array of size M.
- Choose a hash function h, that is a mapping from objects into integers 0, 1, …, M-1.
- Put these objects into an array at indexes computed via the hash function index = h(object). Such an array is called a hash table.
How to choose a hash function?
- The hashCode() method is implemented in the Object class and therefore each class in Java inherits it.
- The hash code provides a numeric representation of an object.
The method hashCode has a different implementation in different classes. In the String class, hashCode is computed by the following formula
s.charAt(0) * 31n-1 + s.charAt(1) * 31n-2 + ... + s.charAt(n-1)
where s is a string and n is its length. An example
"ABC" = 'A' * 312 + 'B' * 31 + 'C' = 65 * 312 + 66 * 31 + 67 = 64578
Note that Java’s hashCode method might return a negative integer.
If a string is long enough, its hashcode will be bigger than the largest integer we can store on 32 bits CPU.
In this case, due to integer overflow, the value returned by hashCode can be negative.
When we put objects into a hashtable, it is possible that different objects (by the equals() method) might have the same hashcode. This is called a collision.
How to resolve collisions?
- One way is by putting the keys that collide in a linked list.
- A linked list is formed at that array index location and a new entry is stored as the next node.
- A hash table then is an array of lists.
Why the Hash Table?
- The big advantage of using a hash table is a constant-time performance for the basic operations add, remove, contains size.
- Though, because of collisions, we cannot guarantee the constant runtime in the worst-case.
- Why? Imagine that all our objects collide into the same index.
- Then searching for one of them will be equivalent to searching in a list, that takes a linear runtime.
- However, we can guarantee an expected constant runtime, if we make sure that our lists won’t become too long.
What is Rehashing?
- The above problem of a too long list is usually implemented by maintaining a load factor that keeps a track of the average length of lists.
- If a load factor approaches a set in the advanced threshold, we create a bigger array and rehash all elements from the old table into the new one. This is called Rehashing.