Re: 2.2.6_andrea2.bz2

Stefan Monnier (foo@acm.com)
Wed, 5 May 1999 17:19:17 -0400


--------
>>>>> "Michael" == Michael Schulz <micha@nats.informatik.uni-hamburg.de> writes:
> So for obvious reasons a hash-function will always reveil better spreading
> over the table, when it is totaly calculated within the prime modulus, which
> is the tablesize. If you use nonprimes, objects tend to be stored in the same
> buckets, which you basicaly what to prevent.

This is a highly debated issue and hastable "experts" disagree.
The `mod 2^n' problem only shows up if your hash value (before modulus)
has a tendency to show some kind of cyclic behavior with cycles of size
2^m. In such a case a `mod prime' will avoid collisions and spread your
data much better.
But if your hash function doesn't suffer from such a 2^m problem, then
`mod 2^n' will work just fine.
What it usually boils down to is whether one wants to use fast bit-twiddling
operations in the hash-function (leading to 2^m artifacts) but with a slow
`mod prime' at the end or whether one prefers a fast `mod 2^n' but with
either a higher collision rate or with a slow hash-function that randomizes
more carefully (using operations such as multiplication by a prime number
or table lookups) in order not to suffer from 2^m artifacts.

I believe the hash function used by Chuck Lever uses the latter approach
(with a multiplication by a prime number). This also saves you from
the burden of having to find the next prime number. Also a multiplication
by a big prime constant is usually faster than finding a (non-constant)
modulus.

Stefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/