Re: more on hash functions

Chuck Lever (cel@monkey.org)
Mon, 12 Apr 1999 15:27:03 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Susan E. Wiig: "Low-level format with Sg drivers"
Previous message: Linus Torvalds: "Re: [patch] fix for cc1 Out of memory [Re: [TESTCASE] `Out of memory for cc1']"
Next in thread: Rogier Wolff: "Re: 2.0 a lot better than 2.2 on high-latency links."
Reply: Rogier Wolff: "Re: 2.0 a lot better than 2.2 on high-latency links."

On Sun, 11 Apr 1999, Paul F. Dietz wrote:
> Memory use can be reduced by using more than two
> tables. If we divide the 24 bit key into 4 6 bit
> fields, the total table size is 512 bytes (assuming
> each table entry is 2 bytes). This doubles the
> number of loads, granted, and increases the shift/mask
> work. Divided into 3 8 bit fields (that should be easy),
> the memory use is 1.5K bytes.

actually, this was my first implementation choice. i found that the hash
function didn't work very well with 3 8-bit tables -- it had poor
bucket size distribution characteristics. how does one combine 3 8-bit
values appropriately to get, say, a 14-bit table index?

> The objection that you need more random tables
> for different hash table sizes is not right.
> Just mask off the irrelevant high order bits
> of the hash value. For tables containing two-byte
> random values, this supports hash tables of size
> up to 2^16.

> I'd be very careful about populating the entries
> of the random tables -- be sure the values chosen
> really are random. If the low order bit is not
> random, for example, you may lose half the hash
> table.

to generate the table, i used random(), seeded with the lower 16 bits of
tv_sec from gettimeofday().

initialize all entries in random_table[] to -1
for (i=0; i<random_table_size; i++) {
j = random number from 0 to random_table_size - 1
while (random_table[j] == -1)
if j == random_table_size
j = 0
else
j++
random_table[j] = i
}

this means that the output of the table is a random number between 0 and
the table size, and is guaranteed unique for each input value.

however, i think we're making different assumptions about what goes in the
random table. i assumed they were random numbers from 0 to table_size,
whereas some of your comments indicate you assume that the random table is
filled with random numbers from 0 to 2^16.

> I did not find a comment in Knuth vol. 3 (1st ed.)
> dismissing this idea. Is it in the 2nd ed.? What did
> he say? Universal classes of hash functions had not
> been discovered at the time the 1st ed. was written.

i'm at work now, so i have my copy handy -- yes it's the second edition.
i was mistaken: his first example *is* table-driven, but the table is used
to map keys to a random index, given that there are only a small
number of known keys, which is a different problem.

he cites a number of papers, some as late as 1997, but i don't see a
specific reference to universal hash functions.

- Chuck Lever

-- corporate: <chuckl@netscape.com> personal: <chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Susan E. Wiig: "Low-level format with Sg drivers"
Previous message: Linus Torvalds: "Re: [patch] fix for cc1 Out of memory [Re: [TESTCASE] `Out of memory for cc1']"
Next in thread: Rogier Wolff: "Re: 2.0 a lot better than 2.2 on high-latency links."
Reply: Rogier Wolff: "Re: 2.0 a lot better than 2.2 on high-latency links."