Here are results that I obtained some time ago. Test is to concurrently
clone (bk) and build (make -jN) kernel source in M directories.
For N = M = 11, TIMEFORMAT='%3R %3S %3U'
REAL SYS USER
"stock" 3818.320 568.999 4358.460
transfer-dirty-on-refill 3368.690 569.066 4377.845
check-PageSwapCache-after-add-to-swap 3237.632 576.208 4381.248
dont-unmap-on-pageout 3207.522 566.539 4374.504
async-writepage 3115.338 562.702 4325.212
(check-PageSwapCache-after-add-to-swap was added to mainline since them.)
These patches weren't updated for some time. Last version is at
ftp://ftp.namesys.com/pub/misc-patches/unsupported/extra/2004.03.25-2.6.5-rc2
[from Nick Piggin's patch]
> > Changes mark_page_accessed to only set the PageAccessed bit, and
> not move pages around the LRUs. This means we don't have to take
> the lru_lock, and it also makes page ageing and scanning consistient
> and all handled in mm/vmscan.c
By the way, batch-mark_page_accessed patch at the URL above also tries
to reduce lock contention in mark_page_accessed(), but through more
standard approach of batching target pages in per-cpu pvec.