Re: [PATCH 1/5] mm: Add __GFP_NO_OOM_KILL flag

From: David Rientjes
Date: Thu May 07 2009 - 18:16:37 EST


On Thu, 7 May 2009, Andrew Morton wrote:

> - the standard way of controlling memory allocator behaviour is via
> the gfp_t. Bypassing that is an unusual step and needs a higher
> level of justification, which I'm not seeing here.
>

The standard way of controlling the oom killer behavior for a zone is via
the ZONE_OOM_LOCKED bit.

> - if we do this via an unusual global, we reduce the chances that
> another subsytem could use the new feature.
>
> I don't know what subsytem that might be, but I bet they're out
> there. checkpoint-restart, virtual machines, ballooning memory
> drivers, kexec loading, etc.
>

There's two separate issues here: the use of ZONE_OOM_LOCKED to control
whether or not to invoke the oom killer for a specific zone (which is
already its only function), and the fact that in this case we're doing it
for all zones. It seems like you're concerned with the latter, but the
distinction in the hibernation case is that no memory freeing would be
possible as the result of the oom killer for _all_ zones, so it makes
sense to lock them all out.

> > The fact is that _all_ allocations here are implicitly __GFP_NO_OOM_KILL
> > whether it specifies it or not since the oom killer would simply kill a
> > task in D state which can't exit or free memory and subsequent allocations
> > would make the oom killer a no-op because there's an eligible task with
> > TIF_MEMDIE set. The only thing you're saving with __GFP_NO_OOM_KILL is
> > calling the oom killer in a first place and killing an unresponsive task
> > but that would have to happen anyway when thawed since the system is oom
> > (or otherwise lockup for GFP_KERNEL with order < PAGE_ALLOC_COSTLY_ORDER).
>
> All the above is specific to the PM application only, when userspace
> tasks are stopped.
>

I'm not arguing that the only way we can ever implement __GFP_NO_OOM_KILL
is for the entire system: we can set ZONE_OOM_LOCKED for only the zones in
the zonelist that are passed to the page allocator. For this particular
purpose, that is naturally all zones; for other future use cases it may be
chosen only to lock out the zones we're allowed to allocate from in that
context.

> It might well end up that stopping userspace (beforehand or before
> oom-killing) is a hard requirement for reliably disabling the
> oom-killer.

Yes, globally, but future use cases may disable only specific zones such
as with memory hot-remove.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/