[PATCH] mm, oom: do not trigger out_of_memory from pagefault_out_of_memory

From: Michal Hocko
Date: Thu May 18 2017 - 04:35:09 EST


Roman Gushchin has noticed that we kill two tasks when the memory hog
killed from page fault path:
[ 25.721494] allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[ 25.725658] allocate cpuset=/ mems_allowed=0
[ 25.727033] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
[ 25.729215] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 25.729598] Call Trace:
[ 25.729598] dump_stack+0x63/0x82
[ 25.729598] dump_header+0x97/0x21a
[ 25.729598] ? do_try_to_free_pages+0x2d7/0x360
[ 25.729598] ? security_capable_noaudit+0x45/0x60
[ 25.729598] oom_kill_process+0x219/0x3e0
[ 25.729598] out_of_memory+0x11d/0x480
[ 25.729598] __alloc_pages_slowpath+0xc84/0xd40
[ 25.729598] __alloc_pages_nodemask+0x245/0x260
[ 25.729598] alloc_pages_vma+0xa2/0x270
[ 25.729598] __handle_mm_fault+0xca9/0x10c0
[ 25.729598] handle_mm_fault+0xf3/0x210
[ 25.729598] __do_page_fault+0x240/0x4e0
[ 25.729598] trace_do_page_fault+0x37/0xe0
[ 25.729598] do_async_page_fault+0x19/0x70
[ 25.729598] async_page_fault+0x28/0x30

which can lead VM_FAULT_OOM and so to another out_of_memory when bailing
out from the #PF
[ 25.817589] allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0
[ 25.818821] allocate cpuset=/ mems_allowed=0
[ 25.819259] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
[ 25.819847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 25.820549] Call Trace:
[ 25.820733] dump_stack+0x63/0x82
[ 25.820961] dump_header+0x97/0x21a
[ 25.820961] ? security_capable_noaudit+0x45/0x60
[ 25.820961] oom_kill_process+0x219/0x3e0
[ 25.820961] out_of_memory+0x11d/0x480
[ 25.820961] pagefault_out_of_memory+0x68/0x80
[ 25.820961] mm_fault_error+0x8f/0x190
[ 25.820961] ? handle_mm_fault+0xf3/0x210
[ 25.820961] __do_page_fault+0x4b2/0x4e0
[ 25.820961] trace_do_page_fault+0x37/0xe0
[ 25.820961] do_async_page_fault+0x19/0x70
[ 25.820961] async_page_fault+0x28/0x30

We wouldn't choose another task normally because oom_evaluate_task will
skip selecting another task while there is an existing oom victim but we
can race with the oom_reaper which can set MMF_OOM_SKIP and so select
another task. Tetsuo Handa has pointed out that 9a67f6488eca926f ("mm:
consolidate GFP_NOFAIL checks in the allocator slowpath") made this more
probable because prior to this patch we have retried the allocation with
access to memory reserves which is likely to succeed. We cannot rely on
that though. So rather than tweaking the allocation path it makes more
sense to revisit pagefault_out_of_memory itself. A lot of time could
have passed between the allocation failure (VM_FAULT_OOM) and
pagefault_out_of_memory so it is inherently risky to trigger
out_of_memory. Moreover we have lost the allocation context completely
so we cannot enforce the numa policy etc... Therefore we cannot retry
the allocation to check the OOM situation. Let's simply drop the
out_of_memory altogether. We still have to care about memcg OOM handling
because we do not do any memcg OOM actions in the allocation context.

Reporeted-by: Roman Gushchin <guro@xxxxxx>
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
---
mm/oom_kill.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 04c9143a8625..13afa80abb4c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1051,25 +1051,13 @@ bool out_of_memory(struct oom_control *oc)
}

/*
- * The pagefault handler calls here because it is out of memory, so kill a
- * memory-hogging task. If oom_lock is held by somebody else, a parallel oom
- * killing is already in progress so do nothing.
+ * The pagefault handler calls here because some allocation has failed. We have
+ * to take care of the memcg OOM here because this is the only safe context without
+ * any locks held but let the oom killer triggered from the allocation context care
+ * about the global OOM.
*/
void pagefault_out_of_memory(void)
{
- struct oom_control oc = {
- .zonelist = NULL,
- .nodemask = NULL,
- .memcg = NULL,
- .gfp_mask = 0,
- .order = 0,
- };
-
if (mem_cgroup_oom_synchronize(true))
return;
-
- if (!mutex_trylock(&oom_lock))
- return;
- out_of_memory(&oc);
- mutex_unlock(&oom_lock);
}
--
2.11.0



--
Michal Hocko
SUSE Labs