Re: [patch 08/11 -mmotm] oom: invoke oom killer for __GFP_NOFAIL

From: David Rientjes
Date: Mon May 11 2009 - 15:10:24 EST


On Mon, 11 May 2009, Dave Hansen wrote:

> Could you explain a little more about why you think this scenario works
> for you? Are large contiguous areas of memory pinned by the task
> getting which you want to get killed? Why wasn't swapping effective
> against this task? Was the task itself taking up a large portion of
> total memory?
>

We frequently do cpuset-constrained oom kills where the lionshare of
memory on a set of nodes is allocated by a single task or a group of
threads all sharing the same memory. Swapping is largely effective but at
this point in the code path it's obviously not making any progress in
freeing pages. So this change fixes two issues:

- __GFP_NOFAIL allocations should not be allowed to return NULL, and

- we should prevent looping endlessly in the page allocator if reclaim
cannot free the requisite amount of memory.

There is no reason that the oom killer would not be able to kill a task
that could free 64K of contiguous memory, especially for those that
mlock() their memory. You could argue that any __GFP_NOFAIL allocation
above order 3 is insane and should not kill tasks, but that's an issue
higher up the stack. If you'd like to identify such instances, we could
emit a warning message here and a stack trace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/