Re: [PATCH] mm: consolidate GFP_NOFAIL checks in the allocator slowpath

From: Hillf Danton
Date: Thu Dec 15 2016 - 03:00:53 EST


On Wednesday, December 14, 2016 11:07 PM Michal Hocko wrote:
> From: Michal Hocko <mhocko@xxxxxxxx>
>
> Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
> detection") has subtly changed semantic for costly high order requests
> with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
> My code inspection didn't reveal any such users in the tree but it is
> true that this might lead to unexpected allocation failures and
> subsequent OOPs.
>
> __alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
> There are few special cases but we are lacking a catch all place to be
> sure we will not miss any case where the non failing allocation might
> fail. This patch reorganizes the code a bit and puts all those special
> cases under nopage label which is the generic go-to-fail path. Non
> failing allocations are retried or those that cannot retry like
> non-sleeping allocation go to the failure point directly. This should
> make the code flow much easier to follow and make it less error prone
> for future changes.
>
> While we are there we have to move the stall check up to catch
> potentially looping non-failing allocations.
>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> ---
> Hi Andrew,
> this has been posted previously as a 2 patch series [1]. This is the first patch.
> The second one has generated a lot of discussion and Tetsuo has naked it based
> because he is worried about a potential lockups. I have argued [2] that there
> are other aspects to consider but then later realized that there is a different
> risk in place which hasn't been considered before. There are some users who are
> performing a lot of __GFP_NOFAIL|GFP_NOFS requests and we certainly do not want to
> give them full access to memory reserves without invoking the OOM killer [3].
>
> For that reason I have dropped the second patch for now and think about
> this some more. The first patch still makes some sense and I find it as
> a useful cleanup so I would ask you to merge it before I find a better
> solution for the other issue. There was no opposition this this patch so I guess
> it should be good to go.
>
> [1] http://lkml.kernel.org/r/20161201152517.27698-1-mhocko@xxxxxxxxxx
> [2] http://lkml.kernel.org/r/20161212084837.GB18163@xxxxxxxxxxxxxx
> [3] http://lkml.kernel.org/r/20161214103418.GH25573@xxxxxxxxxxxxxx
>
> mm/page_alloc.c | 68 ++++++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 41 insertions(+), 27 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3f2c9e535f7f..79b327d9c9a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3640,32 +3640,23 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> goto got_pg;
>
> /* Caller is not willing to reclaim, we can't balance anything */
> - if (!can_direct_reclaim) {
> - /*
> - * All existing users of the __GFP_NOFAIL are blockable, so warn
> - * of any new users that actually allow this type of allocation
> - * to fail.
> - */
> - WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> + if (!can_direct_reclaim)
> goto nopage;
> +
> + /* Make sure we know about allocations which stall for too long */
> + if (time_after(jiffies, alloc_start + stall_timeout)) {
> + warn_alloc(gfp_mask,
> + "page alloction stalls for %ums, order:%u",
> + jiffies_to_msecs(jiffies-alloc_start), order);
> + stall_timeout += 10 * HZ;
> }
>
> /* Avoid recursion of direct reclaim */
> - if (current->flags & PF_MEMALLOC) {
> - /*
> - * __GFP_NOFAIL request from this context is rather bizarre
> - * because we cannot reclaim anything and only can loop waiting
> - * for somebody to do a work for us.
> - */
> - if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> - cond_resched();
> - goto retry;
> - }
> + if (current->flags & PF_MEMALLOC)
> goto nopage;
> - }
>
> /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + if (test_thread_flag(TIF_MEMDIE))
> goto nopage;
>
Nit: currently we allow TIF_MEMDIE & __GFP_NOFAIL request to
try direct reclaim. Are you intentionally reclaiming that chance?

Other than that, feel free to add
Acked-by: Hillf Danton <hillf.zj@xxxxxxxxxxxxxxx>

>
> @@ -3692,14 +3683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
> goto nopage;
>
> - /* Make sure we know about allocations which stall for too long */
> - if (time_after(jiffies, alloc_start + stall_timeout)) {
> - warn_alloc(gfp_mask,
> - "page allocation stalls for %ums, order:%u",
> - jiffies_to_msecs(jiffies-alloc_start), order);
> - stall_timeout += 10 * HZ;
> - }
> -
> if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> did_some_progress > 0, &no_progress_loops))
> goto retry;
> @@ -3728,6 +3711,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> }
>
> nopage:
> + /*
> + * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
> + * we always retry
> + */
> + if (gfp_mask & __GFP_NOFAIL) {
> + /*
> + * All existing users of the __GFP_NOFAIL are blockable, so warn
> + * of any new users that actually require GFP_NOWAIT
> + */
> + if (WARN_ON_ONCE(!can_direct_reclaim))
> + goto fail;
> +
> + /*
> + * PF_MEMALLOC request from this context is rather bizarre
> + * because we cannot reclaim anything and only can loop waiting
> + * for somebody to do a work for us
> + */
> + WARN_ON_ONCE(current->flags & PF_MEMALLOC);
> +
> + /*
> + * non failing costly orders are a hard requirement which we
> + * are not prepared for much so let's warn about these users
> + * so that we can identify them and convert them to something
> + * else.
> + */
> + WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
> +
> + cond_resched();
> + goto retry;
> + }
> +fail:
> warn_alloc(gfp_mask,
> "page allocation failure: order:%u", order);
> got_pg:
> --
> 2.10.2