vma lookup +

Clayton Weaver (cgweav@eskimo.com)
Wed, 9 Sep 1998 18:14:17 -0700 (PDT)


I see what Miller is saying, of course:

The big cost for an avl implementation is tree rebalance on insert and
delete. It's not just that rebalance is a lot of operations, it's
also branchy code, which causes pipeline stalls on superscaler
processors. Hash code tends to replace most of those branch
instructions with shifts, xors, and accumulating adds, with few
branches if any to get to the hash slot.

But hash chains with a fixed number of hash slots don't have the avl
tree's upper bound on lookup time, so you potentially hose application
servers that allocate large numbers of vmas per process. Maybe you can
improve on that with dynamic or fuzzy hashing, but the avl tree
is already debugged, it works, it is ready now, and it effectively
contains the worst case for vma lookup time.

It doesn't have to be one or the other, anyway, to use the 8-bit
search space partition that I suggested. You can have a hash algorithm
for vma lookups that optimizes for vma build and teardown time at the
cost of less than impressive performance for prolific vma users and
compile that in on production servers that tend toward lots of
short-lived processes with small vma usage. You can have the avl
algorithm and use that instead on application servers that the need
the avl tree's upper bound on vma lookups (I don't know exactly how
you would describe the tradeoffs in Configure.help in terms a new user
would understand, however; better make the avl tree the default). If
someone comes up with a debugged dynamic or fuzzy hash algorithm that
improves on the performance of the avl implementation for both small
and huge numbers of vma allocations, great, everybody wins.

The 8-bit partition doesn't have to be the low byte, either. With a
shift and mask instead of just a mask, it can be any contiguous 8-bits
in the vma key (if the keys are addresses, allocation alignment might
suggest better key distribution with a different 8 bits than the low
byte). Ideally, you want the distribution of those bits to be as close
to sequential as you can get for the first 256 vmas allocated for a
process, so that lookup for those vmas is "deref-compare-return",
which improves quite a bit on the lookup bound of a 32 vma linked
list. Even without perfect distribution, you should get a big
performance boost on average lookup time for small numbers of vmas,
for 1k per process of additional vm metadata. The additional cost of
insert for a root node is just assigning to the array index and
zeroing the two subtree pointers in the avl tree node, next to nothing
compared to the linked list if you're consistently finding the vma you
want with one array deref and one key compare.

Reducing the key space in an avl tree by 8 bits also reduces the upper
bound on rebalance cost after instert or delete, where the worst case
is a function of bits_per_key (how many nodes the tree can have).

Regards, Clayton Weaver cgweav@eskimo.com (Seattle)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/faq.html