Re: [patch] mm, oom: make a last minute check to prevent unnecessary memcg oom kills

From: Ami Fischman
Date: Tue Mar 17 2020 - 15:01:01 EST


On Tue, Mar 17, 2020 at 11:26 AM Robert Kolchmeyer
<rkolchmeyer@xxxxxxxxxx> wrote:
>
> On Tue, Mar 10, 2020 at 3:54 PM David Rientjes <rientjes@xxxxxxxxxx> wrote:
> >
> > Robert, could you elaborate on the user-visible effects of this issue that
> > caused it to initially get reported?
>
> Ami (now cc'ed) knows more, but here is my understanding.

Robert's description of the mechanics we observed is accurate.

We discovered this regression in the oom-killer's behavior when
attempting to upgrade our system. The fraction of the system that
went unhealthy due to this issue was approximately equal to the
_sum_ of all other causes of unhealth, which are many and varied,
but each of which contribute only a small amount of
unhealth. This issue forced a rollback to the previous kernel
where we ~never see this behavior, returning our unhealth levels
to the previous background levels.

Cheers,
-a