Re: page fault scalability patch V11 [0/7]: overview

From: Nick Piggin
Date: Sat Nov 20 2004 - 02:15:28 EST


Andrew Morton wrote:
Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

per thread rss


Given that we have contention problems updating a single mm-wide rss and
given that the way to fix that up is to spread things out a bit, it seems
wildly arbitrary to me that the way in which we choose to spread the
counter out is to stick a bit of it into each task_struct.

I'd expect that just shoving a pointer into mm_struct which points at a
dynamically allocated array[NR_CPUS] of longs would suffice. We probably
don't even need to spread them out on cachelines - having four or eight
cpus sharing the same cacheline probably isn't going to hurt much.

At least, that'd be my first attempt. If it's still not good enough, try
something else.



That is what Bill thought too. I guess per-cpu and per-thread rss are
the leading candidates.

Per thread rss has the benefits of cacheline exclusivity, and not
causing task bloat in the common case.

Per CPU array has better worst case /proc properties, but shares
cachelines (or not, if using percpu_counter as you suggested).


I think I'd better leave it to others to finish off the arguments ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/