Re: more on hash functions

Chuck Lever (cel@monkey.org)
Wed, 14 Apr 1999 23:35:33 -0400 (EDT)


On Wed, 14 Apr 1999, Paul F. Dietz wrote:
> > my random tables were constructed such that, for a 256 entry random table,
> > the table contained unique random values from 0 to 255. making these 16
> > bit numbers would only add some hash tablesize flexibility, but i don't
> > believe it would help the hash function generate better randomness.
>
> If the hash table itself has more than 256 entries, then it
> would certainly produce better randomness. The small random
> table entries would mean you are only hashing to the low
> end of the hash table.

i was using the random tables to mix up the bits in each byte, like so:

unsigned char rand1[256] = { ... };
unsigned char rand2[256] = { ... };

#define FIRSTBYTE(x) (0xff & (x))
#define SECNDBYTE(x) (0xff & ((x)>>8))
#define THIRDBYTE(x) (0xff & ((x)>>16))

#define hashfn(key) ( (rand1[FIRSTBYTE(key)] << 4) ^ \
(rand2[SECNDBYTE(key)] << 2) ^ \
(rand1[THIRDBYTE(key)]) )

to generate a 12-bit hash table index. this way i don't have to mask off
extra bits as the final step, and i can use daintily sized random tables.

> In Linux x86, the size of the hash table is 2048 (1 << PAGE_HASH_BITS),
> so your hash function, as implemented, would be very bad.

the page cache hash function, btw, already works very well, and i don't
believe replacing it with any other kind of function will help. the only
hash function we should be concerned with improving is the buffer cache
hash function: see fs/buffer.c, ll. 425-30.

i re-read Knuth over dinner last night. unfortunately, he is discussing
hashes where you store data elements right in the hash table, and can then
easily do linear probing [i heard that, you in the second row]. i can't
think of a good way to apply his double hashing algorithm (algorithm D, p.
528) or Brent's variation (p. 532) to help improve the hash function
without using a multiplicative hash.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/