[PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled

From: Michal Hocko
Date: Mon Feb 23 2015 - 04:33:30 EST

Next message: Brian Norris: "Re: [PATCH v4 02/10] mtd: st_spi_fsm: Fetch boot device locations from DT match tables"
Previous message: Daniel Wagner: "[PATCH] thermal: Defer thermal wakups to threads"
Next in thread: Michal Hocko: "[PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
after OOM killer is disabled if the allocation is performed by a
kernel thread. This behavior was introduced from the very beginning by
7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
This means that the basic contract for the allocation request is broken
and the context requesting such an allocation might blow up
unexpectedly.

There are basically two ways forward.
1) move oom_killer_disable after kernel threads are frozen. This has a
risk that the OOM victim wouldn't be able to finish because it would
depend on an already frozen kernel thread. This would be really
tricky to debug.
2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
Freezable kernel threads will loop and fail the suspend. Incidental
allocations after kernel threads are frozen will at least dump a
warning - if we are lucky and the serial console is still active of
course...

This patch implements the later option because it is safer. We would see
warning rather than allocation failures for the kernel threads which
would blow up otherwise and have a higher chances to identify
__GFP_NOFAIL users from deeper pm code.

Changes since v1
- move the __GFP_NOFAIL check to __alloc_pages_may_oom per David
Rientjes
- replace WARN by WARN_ON_ONCE as per Johannes Weiner

Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d224bbdf8e8..c2ff40a30003 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2363,7 +2363,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
goto out;
}
/* Exhausted what can be done so it's blamo time */
- if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false))
+ if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
+ || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
*did_some_progress = 1;
out:
oom_zonelist_unlock(ac->zonelist, gfp_mask);
--
2.1.4

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Brian Norris: "Re: [PATCH v4 02/10] mtd: st_spi_fsm: Fetch boot device locations from DT match tables"
Previous message: Daniel Wagner: "[PATCH] thermal: Defer thermal wakups to threads"
Next in thread: Michal Hocko: "[PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]