Re: more on hash functions

Stephen C. Tweedie (sct@redhat.com)
Thu, 15 Apr 1999 14:47:39 +0100 (BST)


Hi,

On Wed, 14 Apr 1999 23:35:33 -0400 (EDT), Chuck Lever <cel@monkey.org>
said:

> unsigned char rand1[256] = { ... };
> unsigned char rand2[256] = { ... };

> #define FIRSTBYTE(x) (0xff & (x))
> #define SECNDBYTE(x) (0xff & ((x)>>8))
> #define THIRDBYTE(x) (0xff & ((x)>>16))

> #define hashfn(key) ( (rand1[FIRSTBYTE(key)] << 4) ^ \
> (rand2[SECNDBYTE(key)] << 2) ^ \
> (rand1[THIRDBYTE(key)]) )

> to generate a 12-bit hash table index. this way i don't have to mask off
> extra bits as the final step, and i can use daintily sized random tables.

This is a bad way to go about things. You really do want all of the
input data to be significant in all bits of the final hash function.
The way this is usually achieved is to use full 32-bit wide random
tables, and to xor them together with full precision all the way
through the hash computation. That gives you a final hash value where
every result bit is a nearly-random function of every input bit, and
you can truncate the result to any precision you need. You still only
need 256-entry tables, of course.

unsigned int random[256];
static inline unsigned int hash(char key[3])
{
return random[key[0]] ^ random[key[1]] ^ random[key[2]];
}

If you want the final hash to be a fully random function of every bit
in the input data, you neeed to use separate 32-bit-wide 256-entry
random tables for each byte of the input. You'd have to profile it to
see if that really mattered in this case: you don't want to increase
the cache profile of the random table any more than necessary.

Cheers,
Stephen.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/