Skip to main content

14. Map ADT and hashtables

20/03/23

Hack Tables (hash maps)

  • Are concrete data structure which is suitable for implementing maps
  • Basic idea: convert each key into an index into a (big) array
  • Look-up of keys and insertion and deletion in a hash table usually runs in O(1) time.
    • Not guaranteed, and design of the table needs to be done carefully if want the access to be 'reliably O(1)'

Hash Functions

  • A hash function hh maps keys of a given type to integers in a fixed interval.
  • Integer h(k)h(k) is called the hash value of key kk
  • Hash table for a given key type consists of
    • Hash function hh
    • Array of size NN
  • When implementing a map with a hash table, the goal is to store them at index i=h(k)i=h(k)

Collision Handling

  • Collisions occur when different elements are mapped to the same cell
  • A lot of the theory and practice of hashing consists of devising better ways to avoid or handle collisions

Hash Functions

  • Hash function is usually specified as the composition of two functions:
  • Hash Code: h1:h_1: keys \to integers
  • Compression Function: h2:h_2: integers [0,N1]\to [0,N-1]
  • Hash code is applied first, and the compression function is applied next on the result
  • Goal of the hash function is to disperse the keys in an apparently random way

Dispersal

  • Reduce numbers of collision
  • Random means to pattern
  • If there is an obvious pattern then the incoming data might have a matching pattern that leads to many collisions

Compression Functions

Division:

  • h2(y)=ymodNh_2(y)=y \mod N
  • Size NN of the hash table is usually chosen to be a prime

Multiple, Add and Divide (MAD)

  • h2(y)=(ay+b)modNh_2(y)=(ay+b)\mod N
  • aa and bb are non-negative integers

Collision Handling

  • Collisions occur when different elements are mapped to the same cell
  • Separate Chaining: Let each cell in the table point to (e.g.) a linked list of entries that map there

Map Methods with separate chaining used for collisions

???

Separate chaining

  • Simple and fast, but requires additional memory outside the table
  • When memory is critical then we try harder to remain within the existing memory

Open Addressing

The colliding item is placed in a different cell of the table

  • Linear Probing - Handles collisions by placing the colliding item in the next (circularly) available table cell (variant: cell + c where c is a constant)
  • Each table cell inspected is referred to as a "probe"
  • Disadvantage: Colliding items lump together, causing future collisions to cause a longer sequence of probes

Search with linear probing

Consider a hash table AA that uses linear probing get(k)

  • Start at cell h(k)
  • Probe consecutive locations until one of the following occurs
    • An item with key kk is found or
    • An empty cell is found or
    • NN cells have been unsuccessfully probed

Double Hashing

  • Uses a secondary hash function d(k)d(k) and handles collisions by placing an item in the first available cell of the series
  • Secondary hash function d(k)d(k) cannot have zero values
  • Linear probing is just d(k)=1d(k)=1
  • Table size NN must be a prime to allow probing of all the cells

Performance of Hashing

  • In the worst case, searches, insertions and removals on a hash table take O(n)O(n) time
  • The worst case occurs when all the keys inserted into the map collide
  • The load factor α=nN\alpha=n|N affects the performance of a hash table
  • Expected running time of all the map ADT operations in a hash table is O(1)O(1)
  • Hashing is very fast provided the load factor is not close to 100%

Re-Hashing

  • When table is too full, then re-hash.
  • Create a new larger table and new hash function
  • Have to transfer all the entries from the old table to the new one

Applications

  • Small databases
  • Compilers
  • Browser cache

Comparison of HashMap and PQ

  • Does not use the ordering of keys
  • PQ does not allow direct access to a key