Re: [RFC][PATCH 2/9] deadlock prevention core

From: Andrew Morton
Date: Fri Aug 18 2006 - 18:39:33 EST


On Fri, 18 Aug 2006 14:22:07 -0700
Daniel Phillips <phillips@xxxxxxxxxx> wrote:

> Andrew Morton wrote:
> >Daniel Phillips wrote:
> >>Andrew Morton wrote:
> >
> > ...it's runtime configurable.
>
> So we default to "less than the best" because we are too lazy to fix the
> network starvation issue properly? Maybe we don't really need a mempool for
> struct bio either, isn't that one rather like the reserve pool we propose
> for sk_buffs? (And for that matter, isn't a bio rather more like an sk_buff
> than one would think from the code we have written?)
>
> >>unavailable for write caching just to get around the network receive
> >> starvation issue?
> >
> > No, it's mainly to avoid latency: to prevent tasks which want to allocate
> > pages from getting stuck behind writeback.
>
> This seems like a great help for some common loads, but a regression for
> other not-so-rare loads. Regardless of whether or not marking 60% of memory
> as unusable for write caching is a great idea, it is not a compelling reason
> to fall for the hammer/nail trap.
>
> >>What happens if some in kernel user grabs 68% of kernel memory to do some
> >>very important thing, does this starvation avoidance scheme still work?
> >
> > Well something has to give way. The process might get swapped out a bit,
> > or it might stall in the page allocator because of all the dirty memory.
>
> It is that "something has to give way" that makes me doubt the efficacy of
> our global write throttling scheme as a solution for network receive memory
> starvation. As it is, the thing that gives way may well be the network
> stack. Avoiding this hazard via a little bit of simple code is the whole
> point of our patch set.
>
> Yes, we did waltz off into some not-so-simple code at one point, but after
> discussing it, Peter and I agree that the simple approach actually works
> well enough, so we just need to trim down the patch set accordingly and
> thus prove that our approach really is nice, light and tight.
>
> Specifically, we do not actually need any fancy sk_buff sub-allocator
> (either ours or the one proposed by davem). We already got rid of the per
> device reserve accounting in order to finesse away the sk_buff->dev oddity
> and shorten the patch. So the next rev will be rather lighter than the
> original.
>

Sigh. Daniel, in my earlier emails I asked a number of questions regarding
whether existing facilities, queued patches or further slight kernel
changes could provide a sufficient solution to these problems. The answer
may well be "no". But diligence requires that we be able to prove that,
and handwaving doesn't provide that proof, does it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/