Hash Table

Solves two problems when storing in Binary Search Trees
- Cannot store data that is not comparable
- Insert and search are $O (lo g n)$
Utilizes the constant time access of Arrays
Process
- Data -[Hash Function]→ Hash Code -[Reduce]→ Index
  - Reduce(Hash): Hash % Buckets
- Insert data at index
- If there is other data at that index, then use a collision resolution mechanism
Hash Maps are a Hash Table that is not thread-safe

Hash Functions

Converts a data object to a hash code
Properties
- Necessary
  - Input: Object x
  - Output: An integer representation of x
  - If x = y, H(x) = H(y)
- Best if
  - If x != y, H(x) != H(y)
  - Evenly distributed
  - Easy to compute
Collisions = when two unique inputs have the same hash
- Possible whenever the Pigeonhole Principle applies
- That would always apply because giving every input a unique output would lead to overflows
Ex: Strings
- H(x): (31 ^ char idx) * ascii value
- Prime numbers prevent collisions because they prevent grouping when a modulo reduce function is applied

Buckets store linked list, collisions are appended to the list
- This can be optimized by using a binary search tree instead of a linked list
When searching, find the bucket that contains the item and then search the linked list for the element
Allows for a load factor > 1, though that is not preferable
Less commonly followed because the data is no longer stored contiguously, which slows down access times

All entries are directly in a bucket
Collisions are added to empty spot dependent on probing type
- Linear Probing: next empty spot
- Quadratic Probing: try hash + $n^{2}$ each time one is already filled… (prevents clustering)
When searching, follow the probing type until you find the element
Experimentally faster because contiguous memory access is faster
Index is not directly determined by the hash code (ie. index is open)
Finding after deletion is complicated
- Because an element could have been shifted and then the first occupant was deleted, making it look like the element is not in the table
- Solution: Statuses
  - Each bucket gets a status of never used, deleted, occupied
  - If you check a bucket at an index for an element and it was never used, the element is not in the table
  - If the status was the other two, you should probe until you either see a never used or find the element
We count each failed attempt as another collision (so if the probing function moves twice it counts as 2 collisions)

Search/Index/Delete
- Best/Average case: $O (1)$
- Worst case: $O (n)$
  - Everything is mapping to the same index (bad hash function)