Re: [PFC]: hash instrumentation

Chuck Lever (cel@monkey.org)
Wed, 14 Apr 1999 12:26:13 -0400 (EDT)


On Wed, 14 Apr 1999, Stephen C. Tweedie wrote:
> > how many hash bits did you try? 13? you might consider trying even more,
> > say 15 or 16. benchmarking has shown that the page hash function is
> > stable for any bit size between 11 and 16 (i didn't try others), so
> > varying it, as Doug's patch does, won't degenerate the hash.
>
> 13, but that was quite enough to eliminate __find_page as a significant
> CPU cost in this instance, as reported by readprofile.

factor in less elapsed time and better worst-case (successful
and unsuccessful) hash performance when using a larger table.
surprisingly, CPU cost is only part of the picture, it seems.

i also tested adding the raw offset in the page hash function, and across
the board i still see a measurable performance drop.

> Hmm. This looks like another place where dropping the kernel lock
> during the copy would be beneficial: we already hold the mm semaphore at
> the time, so we're not vulnerable to too many races. I'll look at this.

let me be the first to encourage you to do this! :)

> >> Shrinking the dcaches excessively in this case will simply masaccre the
> >> performance.
>
> > actually, that's not strictly true. shrinking the dcache early will
> > improve the lookup efficiency of the hash, i've found almost by two
> > times.
>
> Sure, but a glibc build is referencing a _lot_ of header files! My
> concern is that the vmscan loop currently invokes a prune_dcache(0),
> which is as aggressive as you can get. If we do that any more
> frequently, getting a good balance of the dcache will be a lot harder.

andrea's arca10 replaces prune_dcache(0) with something a little more
easy-going:

prune_dcache(dentry_stat.nr_unused / (priority+1));

however, having a good dentry replacement policy might be even better.

> FWIW, the profile with the new hash functions but small dcache started
> like this (__find_page and find_buffer have been taken out of inline for
> profiling here):
>
> 4893 d_lookup 23.5240
> 2741 do_anonymous_page 21.4141
> 1486 file_read_actor 18.5750
> 1475 do_wp_page 2.6721
> 1218 __get_free_pages 2.5805
> 1075 __find_page 15.8088
> 844 filemap_nopage 1.1405
> 684 brw_page 0.7403
> 600 lookup_dentry 1.2295
> 594 find_buffer 6.4565
> 567 page_fault 47.2500
> 564 handle_mm_fault 1.2261
> 523 __free_page 2.2543
> 439 free_pages 1.6140
> 420 do_con_write 0.2471
> 403 strlen_user 8.3958
> 391 zap_page_range 0.8806
> 382 do_page_fault 0.4799
>
> and with the larger dcache,
>
> 2434 do_anonymous_page 19.0156
> 1451 do_wp_page 2.6286
> 1343 file_read_actor 16.7875
> 1328 __find_page 19.5294
> 1149 __get_free_pages 2.4343
> 1112 d_lookup 5.3462
> 847 find_buffer 9.2065
> 847 filemap_nopage 1.1446
> 628 brw_page 0.6797
> 580 page_fault 48.3333
> 577 lookup_dentry 1.1824
> 563 handle_mm_fault 1.2239
> 543 __free_page 2.3405
> 414 do_con_write 0.2435
> 397 free_pages 1.4596
> 377 system_call 6.7321
> 356 strlen_user 7.4167
> 354 zap_page_range 0.7973
> 319 do_page_fault 0.4008
>
> Interestingly, do_anonymous_page, do_wp_page and file_read_actor are all
> places where we can probably optimise things to drop the kernel lock.
> That won't make them run faster but on SMP it will certainly let other
> CPUs get more kernel work done. Film at 11.

the normalized value for page_fault is still pretty high: +48. is there
anything that can be done about that, or is that not a concern?

also i tried benchmarking a stock 2.2.5 kernel with a 12 bit inode hash,
and found performance gains as significant as the other gains you found.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/