This experiment is incremental with lowmem_reserve-3 (downloadble in the
same place), and it's against 2.6.9, it rejects against kernel CVS but
it should be easy to fixup.
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/PG_zero-2
Some fix included in the patch is to fallback in the quicklist for the
whole classzone before eating from the buddy, otherwise 1G boxes are
very penalized in terms of entering the buddy system too early, and not
using the quicklists of the lower zones (2.4-aa wasn't penalized). Plus
this adds a sysctl so the thing is tunable at runtime. And there was no
need of using two quicklists for cold and hot pages, less resources are
wasted by just using the lru ordering to diferentiate from hot/cold
allocations and hot/cold freeing.
The API with PG_zero is that if you set __GFP_ZERO in the gfp_mask, then
you must check PG_zero. If PG_zero is set, then you don't need to clear
the page. However you must clear PG_zero before freeing the page if its
contents are not zero anymore by the time you free it, or future users
of __GFP_ZERO will be screwed. So the pagetables for example never clear
PG_zero for the whole duration of the page, infact they set PG_zero if
they're forced to execute clear_page. shmem as well in a fail path if it
fails getting an entry it will free the zero page again and it won't
have to touch PG_zero since it didn't modify the page contents.
Obvious improvements would be to implement a long_write_zero(ptr)
operation that doesn't pollute the cache. IIRC it exists on the alpha, I
assume it exists on x86/x86-64 too. But that's incremental on top of
this.
It seems stable, I'm running it while writing this.
I guess testing on a memory bound architecture would be more interesting
(more cpus will make it more memory bound somewhat).
Comments welcome.