Re: PG_zero

From: Nick Piggin
Date: Mon Nov 01 2004 - 13:35:45 EST


Andrea Arcangeli wrote:
This experiment is incremental with lowmem_reserve-3 (downloadble in the
same place), and it's against 2.6.9, it rejects against kernel CVS but
it should be easy to fixup.

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.9/PG_zero-2


...

Some fix included in the patch is to fallback in the quicklist for the
whole classzone before eating from the buddy, otherwise 1G boxes are
very penalized in terms of entering the buddy system too early, and not
using the quicklists of the lower zones (2.4-aa wasn't penalized). Plus
this adds a sysctl so the thing is tunable at runtime. And there was no
need of using two quicklists for cold and hot pages, less resources are
wasted by just using the lru ordering to diferentiate from hot/cold
allocations and hot/cold freeing.


Not sure if this is wise. Reclaimed pages should definitely be cache
cold. Other freeing is assumed cache hot and LRU ordered on the hot
list which seems right... but I think you want the cold list for page
reclaim, don't you?

The API with PG_zero is that if you set __GFP_ZERO in the gfp_mask, then
you must check PG_zero. If PG_zero is set, then you don't need to clear
the page. However you must clear PG_zero before freeing the page if its
contents are not zero anymore by the time you free it, or future users
of __GFP_ZERO will be screwed. So the pagetables for example never clear
PG_zero for the whole duration of the page, infact they set PG_zero if
they're forced to execute clear_page. shmem as well in a fail path if it
fails getting an entry it will free the zero page again and it won't
have to touch PG_zero since it didn't modify the page contents.


Could the API be made nicer by clearing the page for you if it didn't
find a PG_zero page?


Obvious improvements would be to implement a long_write_zero(ptr)
operation that doesn't pollute the cache. IIRC it exists on the alpha, I
assume it exists on x86/x86-64 too. But that's incremental on top of
this.

It seems stable, I'm running it while writing this.

I guess testing on a memory bound architecture would be more interesting
(more cpus will make it more memory bound somewhat).

Comments welcome.


I have the feeling that it might not be worthwhile doing zero on idle.
You've got chance of blowing the cache on zeroing pages that won't be
used for a while. If you do uncached writes then you've changed the
problem to memory bandwidth (I guess doesn't matter much on UP).

It seems like a good idea to do zero pages in the page allocator if
at all (rather than slab), but I guess you don't want to complicate
it unless it shows improvements in macro benchmarks.

Sorry my feedback isn't much, it is not based on previous experience.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/