Re: [PFC]: hash instrumentation

Chuck Lever (cel@monkey.org)
Tue, 13 Apr 1999 23:45:51 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Gregory Maxwell: "Re: ESP"
Previous message: David Lang: "Re: Static Swap"

On Wed, 14 Apr 1999, Stephen C. Tweedie wrote:
> As an interesting data point, I applied Chuck's last pair of buffer and
> page cache hash updates (the new buffer hash function, and the larger
> PAGE_HASH_BITS) to a 2.2.5 kernel to measure the impact on a parallel
> build of the current glibc-2.1 rpms on a 4-way Xeon. Each of those two
> improvements gained about 30 seconds of wall-clock time on a complete
> build, taking it down from 15 minutes to 14.

how many hash bits did you try? 13? you might consider trying even more,
say 15 or 16. benchmarking has shown that the page hash function is
stable for any bit size between 11 and 16 (i didn't try others), so
varying it, as Doug's patch does, won't degenerate the hash.

> Adding 4 extra bits to the dcache hash was worth 2 full minutes; 12
> minutes total build time.
>
> This was absolutely obvious from the readprofile(8) traces, where
> d_lookup() was massively dominating the system CPU time (I had made both
> the page and buffer hash lookup functions into externs at this point so
> that they would all be profiled separately, not as inlines). d_lookup
> was using about 4 times as much CPU as the nearest other functions
> (which iirc were the page fault COW handler and file read() actors, both
> of which perform large copies).

readprofile traces i've taken show mostly the same thing, although the
page fault handler is by an order of magnitude the highest CPU user
(find_vma() is still up there, too). increasing the efficiency of the
dcache hash will buy probably the biggest performance improvement of any
hash modifications. i've also found that changing the x-or's in the
dcache hash function to addition operations improves the bucket size
distribution and buys a bit more improvement.

> Shrinking the dcaches excessively in this case will simply masaccre the
> performance.

actually, that's not strictly true. shrinking the dcache early will
improve the lookup efficiency of the hash, i've found almost by two times.
there's probably compromise here, so, if you're not _too_ aggressive about
shrinking it (but more aggressive than shrink_dcache**() is now), you may
find even more performance improvement.

my preliminary analysis of the inode hash table is that it is also too
small. find_inode() seems to be used mostly for unsuccessful searches
these days, and the hash table size is so small that the buckets contain
long long lists of inodes, making the unsuccessful searches very
expensive.

- Chuck Lever

-- corporate: <chuckl@netscape.com> personal: <chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Gregory Maxwell: "Re: ESP"
Previous message: David Lang: "Re: Static Swap"