14. Map ADT and hashtables
20/03/23
Hack Tables (hash maps)
- Are concrete data structure which is suitable for implementing maps
- Basic idea: convert each key into an index into a (big) array
- Look-up of keys and insertion and deletion in a hash table usually runs in O(1) time.
- Not guaranteed, and design of the table needs to be done carefully if want the access to be 'reliably O(1)'
Hash Functions
- A hash function maps keys of a given type to integers in a fixed interval.
- Integer is called the hash value of key
- Hash table for a given key type consists of
- Hash function
- Array of size
- When implementing a map with a hash table, the goal is to store them at index
Collision Handling
- Collisions occur when different elements are mapped to the same cell
- A lot of the theory and practice of hashing consists of devising better ways to avoid or handle collisions
Hash Functions
- Hash function is usually specified as the composition of two functions:
- Hash Code: keys integers
- Compression Function: integers
- Hash code is applied first, and the compression function is applied next on the result
- Goal of the hash function is to disperse the keys in an apparently random way
Dispersal
- Reduce numbers of collision
- Random means to pattern
- If there is an obvious pattern then the incoming data might have a matching pattern that leads to many collisions
Compression Functions
Division:
- Size of the hash table is usually chosen to be a prime
Multiple, Add and Divide (MAD)
- and are non-negative integers
Collision Handling
- Collisions occur when different elements are mapped to the same cell
- Separate Chaining: Let each cell in the table point to (e.g.) a linked list of entries that map there
Map Methods with separate chaining used for collisions
???
Separate chaining
- Simple and fast, but requires additional memory outside the table
- When memory is critical then we try harder to remain within the existing memory
Open Addressing
The colliding item is placed in a different cell of the table
- Linear Probing - Handles collisions by placing the colliding item in the next (circularly) available table cell (variant: cell + c where c is a constant)
- Each table cell inspected is referred to as a "probe"
- Disadvantage: Colliding items lump together, causing future collisions to cause a longer sequence of probes
Search with linear probing
Consider a hash table that uses linear probing
get(k)
- Start at cell
h(k)
- Probe consecutive locations until one of the following occurs
- An item with key is found or
- An empty cell is found or
- cells have been unsuccessfully probed
Double Hashing
- Uses a secondary hash function and handles collisions by placing an item in the first available cell of the series
- Secondary hash function cannot have zero values
- Linear probing is just
- Table size must be a prime to allow probing of all the cells
Performance of Hashing
- In the worst case, searches, insertions and removals on a hash table take time
- The worst case occurs when all the keys inserted into the map collide
- The load factor affects the performance of a hash table
- Expected running time of all the map ADT operations in a hash table is
- Hashing is very fast provided the load factor is not close to 100%
Re-Hashing
- When table is too full, then re-hash.
- Create a new larger table and new hash function
- Have to transfer all the entries from the old table to the new one
Applications
- Small databases
- Compilers
- Browser cache
Comparison of HashMap and PQ
- Does not use the ordering of keys
- PQ does not allow direct access to a key