Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

From: Peter Zijlstra
Date: Tue Aug 21 2007 - 11:29:48 EST


[ now with CCs ]

On Tue, 2007-08-21 at 02:28 +0200, Nick Piggin wrote:

> I do of course. There is one thing to have a real lock deadlock
> in some core path, and another to have this memory deadlock in a
> known-to-be-dodgy configuration (Linus said last year that he didn't
> want to go out of our way to support this, right?)... But if you can
> solve it without impacting fastpaths etc. then I don't see any
> objection to it.

That has been my intention, getting the problem solved without touching
fast paths and with minimal changes to how things are currently done.

> I don't mean for correctness, but for throughput. If you're doing a
> lot of network operations right near the memory limit, then it could
> be possible that these deadlock paths get triggered relatively often.
> With Christoph's patches, I think it would tend to be less.

Christoph's patches all rely on file backed memory being predominant.
[ and to a certain degree fully ignore anonymous memory loads :-( ]

Whereas quite a few realistic loads strive to minimise these - I'll
again fall back to my MPI cluster example, they would want to use so
much anonymous memory to preform their calculations that everything
except the hot paths of code are present in memory. In these scenarios 1
MB of text would already be a lot.

> > > How are your deadlock patches going anyway? AFAIK they are mostly a network
> > > issue and I haven't been keeping up with them for a while.
> >
> > They really do rely on some VM interaction too, network does not have
> > enough information to break out of the deadlock on its own.
>
> The thing I don't much like about your patches is the addition of more
> of these global reserve type things in the allocators. They kind of
> suck (not your code, just the concept of them in general -- ie. including
> the PF_MEMALLOC reserve). I'd like to eventually reach a model where
> reclaimable memory from a given subsystem is always backed by enough
> resources to be able to reclaim it. What stopped you from going that
> route with the network subsystem? (too much churn, or something
> fundamental?)

I'm wanting to keep the patches as non-intrusive as possible, exactly
because some people consider this a fringe functionality. Doing as you
say does sound like a noble goal, but would require massive overhauls.

Also, I'm not quite sure how this would apply to networking. It
generally doesn't have much reclaimable memory sitting around, and it
heavily relies on kmalloc so an alloc/free cycle accounting system would
quickly involve a lot of the things I'm already doing.

(also one advantage of keeping it all in the buddy allocator is that it
can more easily form larger order pages)

Attachment: signature.asc
Description: This is a digitally signed message part