Hash functions (was Re: 2.2.6_andrea2.bz2)

Harvey J. Stein (hjstein@bfr.co.il)
06 May 1999 08:31:18 +0300


"Stefan Monnier <foo@acm.com>" <monnier+lists/linux/kernel/news/@tequila.cs.yale.edu> writes:

> What it usually boils down to is whether one wants to use fast bit-twiddling
> operations in the hash-function (leading to 2^m artifacts) but with a slow
> `mod prime' at the end or whether one prefers a fast `mod 2^n' but with
> either a higher collision rate or with a slow hash-function that randomizes
> more carefully (using operations such as multiplication by a prime number
> or table lookups) in order not to suffer from 2^m artifacts.
>
> I believe the hash function used by Chuck Lever uses the latter approach
> (with a multiplication by a prime number). This also saves you from
> the burden of having to find the next prime number. Also a multiplication
> by a big prime constant is usually faster than finding a (non-constant)
> modulus.

That can't be right. An even times a prime is still even, so you'll
still miss alternate buckets if your table size is a power of 2. More
precisely, if a hash_fcn mod 2^m suffers from poor distribution, then
hash_fcn * prime mod 2^m will have the same poor distribution - 2
objects that went into the same bucket under hash_fcn will still be in
the same bucket under hash_fcn * prime because primes are invertible
mod 2^m.

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/