Re: [patch 08/11 -mmotm] oom: invoke oom killer for __GFP_NOFAIL

From: David Rientjes
Date: Mon May 11 2009 - 16:21:35 EST


On Mon, 11 May 2009, Dave Hansen wrote:

> So, there are three conditions that have to be met before we get to this
> behavior:
> 1. order > PAGE_ALLOC_COSTLY_ORDER
> 2. __GFP_NOFAIL set
> 3. no progress being made
>
> The two hunks of this patch really do different things to me.
> Personally, I'd split them up.
>
> This patch's first hunk effectively says, "When we have __GFP_NOFAIL
> set, it is now OK to OOM tasks for 'order > PAGE_ALLOC_COSTLY_ORDER'".
> That's a little bit goofy to me. Either OOMs help large-order allocs or
> they don't. __GFP_NOFAIL doesn't change how effective an OOM is.
>

Right, but the problem is that we don't know what the oom killer did
(until you get to patches 10 and 11 in this series). It would be easier
to use the __GFP_REPEAT retry logic to determine whether to continue
looping, but we currently bypass that because reclaim failed and we don't
know what the oom killer did. My later patches change that, so
__GFP_REPEAT actually works in this case.

> Yeah, adding a new warning or enhancing the existing "page allocation
> failure" warning would be nice.
>

Ok, it's a good idea that I never considered.

> Aren't you a bit worried that people will keep adding new 'goto nopage'
> cases in here? Would it be more effective to just put a:
>
> if (gfp_mask & __GFP_NOFAIL)
> goto restart;
>
> at the very end of the function? That should keep people from screwing
> this up in the future a bit better.
>

goto nopage is the only thing we have to worry about since the retry logic
does get invoked otherwise (and any order less than
PAGE_ALLOC_COSTLY_ORDER is actually implicitly __GFP_NOFAIL there,
regardless of reclaim progress). We do actually need to bypass that logic
for a couple of cases (GFP_THISNODE and GFP_ATOMIC) where reclaim is
inherently denied. We lack BUG_ON()'s for such cases undoubtedly because
of the performance impact in the very hot path.

> Thanks for going to the trouble of trying to sort all this code out a
> bit better. I know it is a mess. Could you try and beef up the
> descriptions a bit in the next pass? Some of the stuff about where
> you're encountering these situations would be really helpful, especially
> with 'git blame' these days.
>

Ok, thanks for the comments.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/