Re: [RFC 0/7] Postphone reclaim laundry to write at high watermarks

From: Ingo Molnar
Date: Wed Aug 22 2007 - 03:45:34 EST



* Christoph Lameter <clameter@xxxxxxx> wrote:

> > I want slab to fail when a similar page alloc would fail, no magic.
>
> Yes I know. I do not want allocations to fail but that reclaim occurs
> in order to avoid failing any allocation. We need provisions that make
> sure that we never get into such a bad memory situation that would
> cause severe slowless and usually end up in a livelock anyways.

Could you outline the "big picture" as you see it? To me your argument
that reclaim can always be done instantly and that the cases where it
cannot be done are pathological and need to be avoided is fundamentally
dangerous and quite a bit short-sighted at first glance.

The big picture way to think about this is the following: the free page
pool is the "cache" of the MM. It's what "greases" the mechanism and
bridges the inevitable reclaim latency and makes "atomic memory"
available to the reclaim mechanism itself. We _cannot_ remove that cache
without a conceptual replacement (or a _very_ robust argument and proof
that the free pages pool is not needed at all - this would be a major
design change (and a stupid mistake IMO)). Your patchset, in essence,
tries to claim that we dont really need this cache and that all that
matters is to keep enough clean pagecache pages around. That approach
misses the full picture and i dont think we can progress without
agreeing on the fundamentals first.

That "cache" cannot be handled in your scheme: a fully or mostly
anonymous workload (tons of apps are like that) instantly destroys the
"there is always a minimal amount of atomically reclaimable pages
around" property of freelists, and this cannot be talked or tweaked
around by twiddling any existing property of anonymous reclaim.
Anonymous memory is dirty and takes ages to reclaim. The fact that your
patchset causes an easy anonymous OOM further underlines this flaw of
your thinking. Not making anonymous workloads OOM is the _hardest_ part
of the MM, by far. Pagecache reclaim is a breeze in comparison :-)

So there is a large and fundamental rift between having pages on the
freelist (instantly available to any context) and having them on the
(current) LRU where they might or might not be clean, etc. The freelists
are an implicit guarantee of buffering and atomicity and they can and do
save the day if everything else fails to keep stuff insta-freeable. (And
then we havent even considered the performance and scalability
differences between picking from the pcp freelists versus picking pages
from the LRU, havent considered the better higher-order page allocation
property of the buddy pool and havent considered the atomicity of
in-irq-handler allocations.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/