Re: [PATCH 1/5] mm: Add __GFP_NO_OOM_KILL flag

From: Andrew Morton
Date: Thu May 07 2009 - 18:51:36 EST


On Thu, 7 May 2009 15:16:17 -0700 (PDT)
David Rientjes <rientjes@xxxxxxxxxx> wrote:

> On Thu, 7 May 2009, Andrew Morton wrote:
>
> > - the standard way of controlling memory allocator behaviour is via
> > the gfp_t. Bypassing that is an unusual step and needs a higher
> > level of justification, which I'm not seeing here.
> >
>
> The standard way of controlling the oom killer behavior for a zone is via
> the ZONE_OOM_LOCKED bit.

oop, I didn't remember/realise that ZONE_OOM_LOCKED already exists.

> > - if we do this via an unusual global, we reduce the chances that
> > another subsytem could use the new feature.
> >
> > I don't know what subsytem that might be, but I bet they're out
> > there. checkpoint-restart, virtual machines, ballooning memory
> > drivers, kexec loading, etc.
> >
>
> There's two separate issues here: the use of ZONE_OOM_LOCKED to control
> whether or not to invoke the oom killer for a specific zone (which is
> already its only function), and the fact that in this case we're doing it
> for all zones. It seems like you're concerned with the latter, but the
> distinction in the hibernation case is that no memory freeing would be
> possible as the result of the oom killer for _all_ zones, so it makes
> sense to lock them all out.

OK.

> > > The fact is that _all_ allocations here are implicitly __GFP_NO_OOM_KILL
> > > whether it specifies it or not since the oom killer would simply kill a
> > > task in D state which can't exit or free memory and subsequent allocations
> > > would make the oom killer a no-op because there's an eligible task with
> > > TIF_MEMDIE set. The only thing you're saving with __GFP_NO_OOM_KILL is
> > > calling the oom killer in a first place and killing an unresponsive task
> > > but that would have to happen anyway when thawed since the system is oom
> > > (or otherwise lockup for GFP_KERNEL with order < PAGE_ALLOC_COSTLY_ORDER).
> >
> > All the above is specific to the PM application only, when userspace
> > tasks are stopped.
> >
>
> I'm not arguing that the only way we can ever implement __GFP_NO_OOM_KILL
> is for the entire system: we can set ZONE_OOM_LOCKED for only the zones in
> the zonelist that are passed to the page allocator. For this particular
> purpose, that is naturally all zones; for other future use cases it may be
> chosen only to lock out the zones we're allowed to allocate from in that
> context.

OK.

> > It might well end up that stopping userspace (beforehand or before
> > oom-killing) is a hard requirement for reliably disabling the
> > oom-killer.
>
> Yes, globally, but future use cases may disable only specific zones such
> as with memory hot-remove.

<goes off to find out what ZONE_OOM_LOCKED does>

That took remarkably longer than one would have expected..

Yes, OK, I agree, globally setting ZONE_OOM_LOCKED would produce a
decent result.

The setting and clearing of that thing looks gruesomely racy..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/