random table-driven hash benchmarked

Chuck Lever (cel@monkey.org)
Thu, 15 Apr 1999 17:49:23 -0400 (EDT)


ok, i re-tried the random table-driven hash function in the buffer cache.
here's my hash function, derived from the good discussion on the list:

#define BYTE1(x) (0xff & (x))
#define BYTE2(x) (0xff & (((x) >> 8) + 67))
#define BYTE3(x) (0xff & (((x) >> 16) + 131))
#define BYTE4(x) (0xff & (((x) >> 24) + 197))

#define _hashfn(dev,block) \
( random_table[BYTE1(block)] + \
random_table[BYTE2(block)] + \
random_table[BYTE3(block)] + \
random_table[BYTE4(block)] ) & bh_hash_mask

random_table is an array of 256 unsigned shorts, all random, ranging
between 0 and 65535.

the random table-driven hash function features:

1. a single random table that takes up only 512 bytes
2. the ability to address a hash table up to 65536 buckets flexibly
(just vary the bits in the bh_hash_mask mask)
3. mapping all 32 bits of input into the hash table index
4. no multiplication or division

here's the multiplicative hash, for comparison:

/* this is 40499 * 65543 - both are prime; the result is ~0.68*(2^32) */
#define MULTIPLIER 2654425957UL

#define _hashfn(dev,block) \
((((unsigned long) (block) * MULTIPLIER) >> 11) & bh_hash_mask)

i benchmarked this against a kernel that used my multiplicative hash on an
AMD K6 at 233Mhz with 64M and a 9G Seagate IDE drive. the results below
are 5-run average throughput values in "scripts per hour", with standard
deviation listed afterwards. the offered load varies from 8 concurrent
scripts to 32 concurrent scripts.

8 scripts 16 scripts 24 scripts 32 scripts
mult: 1088.5 s=4.38 1051.8 s=6.59 1019.7 s=6.79 986.7 s=8.56
rand: 1084.7 s=1.16 1039.7 s=14.99 1004.8 s=6.86 987.2 s=5.47

both runs used a 16384 bucket hash table, and averaged around 0.1
find_buffer() iterations per lookup, during a hit rate of about 78%.
the hash table bucket size distributions looked about the same for both
hash functions.

the way i read the results, using the table-driven algorithm above isn't a
win over multiplicative hashing.

i believe that the table-driven hash function itself is expensive to
calculate on any architecture. it requires 25 instructions in 92 bytes on
ia32, not to mention all the memory references and register spillage
required to calculate a result. and hopefully the compiler has
rescheduled the instructions to avoid address generation interlock;
otherwise we can expect a few pipeline stalls during the computation.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/