Re: [PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves

From: Vlastimil Babka
Date: Mon Nov 23 2015 - 04:43:49 EST


On 11/23/2015 10:29 AM, Michal Hocko wrote:
On Sun 22-11-15 13:55:31, Vlastimil Babka wrote:
On 11.11.2015 14:48, mhocko@xxxxxxxxxx wrote:
mm/page_alloc.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8034909faad2..d30bce9d7ac8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2766,8 +2766,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
goto out;
}
/* Exhausted what can be done so it's blamo time */
- if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
+ if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
*did_some_progress = 1;
+
+ if (gfp_mask & __GFP_NOFAIL) {
+ page = get_page_from_freelist(gfp_mask, order,
+ ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+ WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
+ " Consider increasing min_free_kbytes.\n");

It seems redundant to me to keep the WARN_ON_ONCE also above in the if () part?

They are warning about two different things. The first one catches a
buggy code which uses __GFP_NOFAIL from oom disabled context while the

Ah, I see, I misinterpreted what the return values of out_of_memory() mean. But now that I look at its code, it seems to only return false when oom_killer_disabled is set to true. Which is a global thing and nothing to do with the context of the __GFP_NOFAIL allocation?

second one tries to help the administrator with a hint that memory
reserves are too small.

Also s/gfp_nofail/GFP_NOFAIL/ for consistency?

Fair enough, changed.

Hm and probably out of scope of your patch, but I understand the WARN_ONCE
(WARN_ON_ONCE) to be _ONCE just to prevent a flood from a single task looping
here. But for distinct tasks and potentially far away in time, wouldn't we want
to see all the warnings? Would that be feasible to implement?

I was thinking about that as well some time ago but it was quite
hard to find a good enough API to tell when to warn again. The first
WARN_ON_ONCE should trigger for all different _code paths_ no matter
how frequently they appear to catch all the buggy callers. The second
one would benefit from a new warning after min_free_kbytes was updated
because it would tell the administrator that the last update was not
sufficient for the workload.

Hm, what about adding a flag to the struct alloc_context, so that when the particular allocation attempt emits the warning, it sets a flag in the alloc_context so that it won't emit them again as long as it keeps looping and attempting oom. Other allocations will warn independently.

We could also print the same info as the "allocation failed" warnings do, since it's very similar, except we can't fail - but the admin/bug reporter should be interested in the same details as for an allocation failure that is allowed to fail. But it's also true that we have probably just printed the info during out_of_memory()... except when we skipped that for some reason?


+ }
+ }
out:
mutex_unlock(&oom_lock);
return page;


Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/