On 02 May 1999 00:23:23 +0300, hjstein@bfr.co.il (Harvey J. Stein) said:
> "Stephen C. Tweedie" <sct@redhat.com> writes:
>> In other words, I can throw memory at hashing to make it faster,
>> but trees have a fixed cost which necessarily grows with their
>> size.
> Well, sort of. If you assume your hash function is sufficiently good,
> then yes. But this is a pretty strong assumption.
Chuck Lever has been profiling various hash functions. We can get
pretty good has bucket distribution in real life.
> On the other hand, 2^20 = 1048576, so if you're storing < 1 million
> objects, you've got at worst 20 nodes to go through in a balanced
> tree. For a red-black tree the lack of balance gives you a factor of
> 2 in the max path length. So, at worst you have to go through 40
> nodes for a million objects.
But each one of those searches is slower than a hash chain search, and
insert/delete become O(log n) instead of O(1) too.
> With hash tables you can, for example, go from size N to size 2N
> whenever you have more than 3N items, which is still O(1) on average,
> but will take significant time on the (rare) occasions of doubling the
> table size.
Irrelevant: aggregate amortised performance is really the only important
thing here.
> But, even when you do this, you're only guaranteeing an average bucket
> size <3, and if you happen into a pathological situation (everything
> hashing to the same #), you'll have O(N) access times no matter what
> you do to the hash table size.
Sure. We don't see that in practice.
> Trees basically have worse average behavior but better worst case
> behavior.
They have an average-case behaviour which cannot be improved by
increasing table sizes. For large, performance-critical caches such as
the page or buffer caches (which can easily grow to over a gig on large
machines), that's a killer.
> But trees could still be handy even if they're slower. For example,
> if you really need the ordering that trees maintain (maybe for the
> elevator algorithm for disk i/o), then they could be a big win even if
> they're slower for the average case.
Absolutely. For places where we have variable-length objects which
require ordering, we already do this --- look at the AVL trees in the
VM.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/