Re: more on hash functions

Paul F. Dietz (dietz@interaccess.com)
Sat, 10 Apr 1999 17:15:01 -0500


Chuck Lever wrote:

> agreed that multiplication should be avoided. however, in some cases
> nothing else works.

However, this is not one of those cases, if loads are
much cheaper than multiplication. A quick
and easy hash function would be to hash a 20 bit
value by using its two 10 bit words as indices into
two tables of randomly chosen values (in the
range 0..PAGE_HASH_SIZE-1). Xor the two values together.

This class of hash functions (parameterized by
the table contents) is provably good, in the sense
that for any two distinct keys, a randomly chosen
hash function from this class will cause the keys
to collide with probability 1/PAGE_HASH_SIZE
(the class of hash functions is "universal";
see Cormen, Leiserson, and Rivest, "Introduction
to Algorithms", chapter 12.)

Moreover, the expected behavior of hashing with
chaining is O(n/m) time per operation for n values
stored in a table of size m (basically, pairwise
independence of the hash values is enough to
ensure this.) Again, this is regardless of
the actual keys; the randomness comes from the
tables.

Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/