Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

From: Marc MERLIN
Date: Tue Nov 29 2016 - 11:16:08 EST


On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote:
> Marc, could you try this patch please? I think it should be pretty clear
> it should help you but running it through your use case would be more
> than welcome before I ask Greg to take this to the 4.8 stable tree.
>
> Thanks!
>
> On Wed 23-11-16 07:34:10, Michal Hocko wrote:
> [...]
> > commit b2ccdcb731b666aa28f86483656c39c5e53828c7
> > Author: Michal Hocko <mhocko@xxxxxxxx>
> > Date: Wed Nov 23 07:26:30 2016 +0100
> >
> > mm, oom: stop pre-mature high-order OOM killer invocations
> >
> > 31e49bfda184 ("mm, oom: protect !costly allocations some more for
> > !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM
> > killer invocation for high order requests. It seemed to work for most
> > users just fine but it is far from bullet proof and obviously not
> > sufficient for Marc who has reported pre-mature OOM killer invocations
> > with 4.8 based kernels. 4.9 will all the compaction improvements seems
> > to be behaving much better but that would be too intrusive to backport
> > to 4.8 stable kernels. Instead this patch simply never declares OOM for
> > !costly high order requests. We rely on order-0 requests to do that in
> > case we are really out of memory. Order-0 requests are much more common
> > and so a risk of a livelock without any way forward is highly unlikely.
> >
> > Reported-by: Marc MERLIN <marc@xxxxxxxxxxx>
> > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>

Tested-by: Marc MERLIN <marc@xxxxxxxxxxx>

Marc

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a2214c64ed3c..7401e996009a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
> > if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
> > return false;
> >
> > +#ifdef CONFIG_COMPACTION
> > + /*
> > + * This is a gross workaround to compensate a lack of reliable compaction
> > + * operation. We cannot simply go OOM with the current state of the compaction
> > + * code because this can lead to pre mature OOM declaration.
> > + */
> > + if (order <= PAGE_ALLOC_COSTLY_ORDER)
> > + return true;
> > +#endif
> > +
> > /*
> > * There are setups with compaction disabled which would prefer to loop
> > * inside the allocator rather than hit the oom killer prematurely.
> > --
> > Michal Hocko
> > SUSE Labs
>
> --
> Michal Hocko
> SUSE Labs
>

--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901