Skip to main content

14. Map ADT and hashtables

20/03/23

Hack Tables (hash maps)

Are concrete data structure which is suitable for implementing maps
Basic idea: convert each key into an index into a (big) array
Look-up of keys and insertion and deletion in a hash table usually runs in O(1) time.
- Not guaranteed, and design of the table needs to be done carefully if want the access to be 'reliably O(1)'

Hash Functions

A hash function $h$ maps keys of a given type to integers in a fixed interval.
Integer $h(k)$ is called the hash value of key $k$
Hash table for a given key type consists of
- Hash function $h$
- Array of size $N$
When implementing a map with a hash table, the goal is to store them at index $i=h(k)$

Collision Handling

Collisions occur when different elements are mapped to the same cell
A lot of the theory and practice of hashing consists of devising better ways to avoid or handle collisions

Hash Functions

Hash function is usually specified as the composition of two functions:
Hash Code: $h_1:$ keys $\to$ integers
Compression Function: $h_2:$ integers $\to [0,N-1]$
Hash code is applied first, and the compression function is applied next on the result
Goal of the hash function is to disperse the keys in an apparently random way

Dispersal

Reduce numbers of collision
Random means to pattern
If there is an obvious pattern then the incoming data might have a matching pattern that leads to many collisions

Compression Functions

Division:

$h_2(y)=y \mod N$
Size $N$ of the hash table is usually chosen to be a prime

Multiple, Add and Divide (MAD)

$h_2(y)=(ay+b)\mod N$
$a$ and $b$ are non-negative integers

Collision Handling

Collisions occur when different elements are mapped to the same cell
Separate Chaining: Let each cell in the table point to (e.g.) a linked list of entries that map there

Map Methods with separate chaining used for collisions

???

Separate chaining

Simple and fast, but requires additional memory outside the table
When memory is critical then we try harder to remain within the existing memory

Open Addressing

The colliding item is placed in a different cell of the table

Linear Probing - Handles collisions by placing the colliding item in the next (circularly) available table cell (variant: cell + c where c is a constant)
Each table cell inspected is referred to as a "probe"
Disadvantage: Colliding items lump together, causing future collisions to cause a longer sequence of probes

Search with linear probing

Consider a hash table $A$ that uses linear probing get(k)

Start at cell h(k)
Probe consecutive locations until one of the following occurs
- An item with key $k$ is found or
- An empty cell is found or
- $N$ cells have been unsuccessfully probed

Double Hashing

Uses a secondary hash function $d(k)$ and handles collisions by placing an item in the first available cell of the series
Secondary hash function $d(k)$ cannot have zero values
Linear probing is just $d(k)=1$
Table size $N$ must be a prime to allow probing of all the cells

Performance of Hashing

In the worst case, searches, insertions and removals on a hash table take $O(n)$ time
The worst case occurs when all the keys inserted into the map collide
The load factor $\alpha=n|N$ affects the performance of a hash table
Expected running time of all the map ADT operations in a hash table is $O(1)$
Hashing is very fast provided the load factor is not close to 100%

Re-Hashing

When table is too full, then re-hash.
Create a new larger table and new hash function
Have to transfer all the entries from the old table to the new one

Applications

Small databases
Compilers
Browser cache

Comparison of HashMap and PQ

Does not use the ordering of keys
PQ does not allow direct access to a key

Hack Tables (hash maps)
Map Methods with separate chaining used for collisions
Separate chaining
Performance of Hashing
Re-Hashing
Applications
Comparison of HashMap and PQ