Peter Occil

A Note on Hash Functions

A hash function is a function that maps an input of arbitrary size into an output with a fixed number of bits, called a hash code.

Hash functions are used for the following purposes:

For the use in hash tables, (Richter et al. 2015)1 recommends multiply-then-shift hashing over more complicated hash functions in most cases. Fibonacci hashing is a special case of multiply-then-shift hashing that serves to improve hash codes.

There are security attacks that serve to trigger worst-case performance on hash tables via carefully chosen keys, and keyed hash functions (such as SipHash) have been developed to mitigate them, but not everyone believes such hash functions should be used in hash tables (for example, see R. Urban’s SMHasher fork).

Bret Mulvey has written a page on how hash functions are built. According to him, hash functions have as their core a mixing function (a function that maps N-bit inputs to N-bit outputs), and for hash functions the “best” mixing functions tend to produce wildly dispersed outputs for similar inputs.

Notes

  1. Richter, Alvarez, Dittrich, “A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing”, Proceedings of the VLDB Endowment 9(3), 2015.