Re: 2.2.6_andrea2.bz2

Stephen C. Tweedie (sct@redhat.com)
Tue, 4 May 1999 14:51:00 +0100 (BST)


Hi,

On 02 May 1999 00:23:23 +0300, hjstein@bfr.co.il (Harvey J. Stein) said:

> "Stephen C. Tweedie" <sct@redhat.com> writes:

>> In other words, I can throw memory at hashing to make it faster,
>> but trees have a fixed cost which necessarily grows with their
>> size.

> Well, sort of. If you assume your hash function is sufficiently good,
> then yes. But this is a pretty strong assumption.

Chuck Lever has been profiling various hash functions. We can get
pretty good has bucket distribution in real life.

> On the other hand, 2^20 = 1048576, so if you're storing < 1 million
> objects, you've got at worst 20 nodes to go through in a balanced
> tree. For a red-black tree the lack of balance gives you a factor of
> 2 in the max path length. So, at worst you have to go through 40
> nodes for a million objects.

But each one of those searches is slower than a hash chain search, and
insert/delete become O(log n) instead of O(1) too.

> With hash tables you can, for example, go from size N to size 2N
> whenever you have more than 3N items, which is still O(1) on average,
> but will take significant time on the (rare) occasions of doubling the
> table size.

Irrelevant: aggregate amortised performance is really the only important
thing here.

> But, even when you do this, you're only guaranteeing an average bucket
> size <3, and if you happen into a pathological situation (everything
> hashing to the same #), you'll have O(N) access times no matter what
> you do to the hash table size.

Sure. We don't see that in practice.

> Trees basically have worse average behavior but better worst case
> behavior.

They have an average-case behaviour which cannot be improved by
increasing table sizes. For large, performance-critical caches such as
the page or buffer caches (which can easily grow to over a gig on large
machines), that's a killer.

> But trees could still be handy even if they're slower. For example,
> if you really need the ordering that trees maintain (maybe for the
> elevator algorithm for disk i/o), then they could be a big win even if
> they're slower for the average case.

Absolutely. For places where we have variable-length objects which
require ordering, we already do this --- look at the AVL trees in the
VM.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/