Re: [RFC 0/7] Postphone reclaim laundry to write at high water marks

From: Christoph Lameter
Date: Wed Aug 22 2007 - 15:19:25 EST


On Wed, 22 Aug 2007, Ingo Molnar wrote:

> Could you outline the "big picture" as you see it? To me your argument
> that reclaim can always be done instantly and that the cases where it
> cannot be done are pathological and need to be avoided is fundamentally
> dangerous and quite a bit short-sighted at first glance.

That is a bit overdrawing my argument. The issues that Peter saw can be
fixed by allowing recursive reclaim (see the earlier patchset). The rest
is so far sugar on top or building extreme cases where we already have
trouble.

> The big picture way to think about this is the following: the free page
> pool is the "cache" of the MM. It's what "greases" the mechanism and
> bridges the inevitable reclaim latency and makes "atomic memory"
> available to the reclaim mechanism itself. We _cannot_ remove that cache
> without a conceptual replacement (or a _very_ robust argument and proof
> that the free pages pool is not needed at all - this would be a major
> design change (and a stupid mistake IMO)). Your patchset, in essence,
> tries to claim that we dont really need this cache and that all that
> matters is to keep enough clean pagecache pages around. That approach
> misses the full picture and i dont think we can progress without
> agreeing on the fundamentals first.

The patchset attempts to deal with the reserves in a more intelligent
way in order not to fail when this pool becomes exhausted because some
device needs a lot of memory in the writeout path.

> That "cache" cannot be handled in your scheme: a fully or mostly
> anonymous workload (tons of apps are like that) instantly destroys the
> "there is always a minimal amount of atomically reclaimable pages
> around" property of freelists, and this cannot be talked or tweaked
> around by twiddling any existing property of anonymous reclaim.

A extreme anonymous workload like discussed here can even cause the
current VM to fail. Realistically at least portions of the executable and
varios slab caches will remain in memory in addition to the reserves.

> Anonymous memory is dirty and takes ages to reclaim. The fact that your
> patchset causes an easy anonymous OOM further underlines this flaw of
> your thinking. Not making anonymous workloads OOM is the _hardest_ part
> of the MM, by far. Pagecache reclaim is a breeze in comparison :-)

The central flaw in my thinking was the switching of of PF_MEMALLOC on the
writeout path instead of allowing recursive PF_MEMALLOC reclaim as in the
first patch. But the first patchset did not have that flaw.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/