Re: page fault scalability patch V11 [0/7]: overview

From: Nick Piggin
Date: Fri Nov 19 2004 - 21:31:13 EST


William Lee Irwin III wrote:
On Fri, Nov 19, 2004 at 11:59:03AM -0800, Linus Torvalds wrote:

You could also make "rss" be a _signed_ integer per-thread.
When unmapping a page, you decrement one of the threads that shares the mm (doesn't matter which - which is why the per-thread rss may go negative), and when mapping a page you increment it.
Then, anybody who actually wants a global rss can just iterate over
threads and add it all up. If you do it under the mmap_sem, it's stable,
and if you do it outside the mmap_sem it's imprecise but stable in the
long term (ie errors never _accumulate_, like the non-atomic case will do).
Does anybody care enough? Maybe, maybe not. It certainly sounds a hell of a lot better than the periodic scan.


Unprivileged triggers for full-tasklist scans are NMI oops material.


What about pushing the per-thread rss delta back into the global atomic
rss counter in each schedule()?

Pros:
This would take the task exiting problem into its stride as a matter of
course.

Single atomic read to get rss.

Cons:
would just be moving the atomic op somewhere else if we don't get
many page faults per schedule.

Not really nice dependancies.

Assumes schedule (not context switch) must occur somewhat regularly.
At present this is not true for SCHED_FIFO tasks.


Too nasty?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/