Re: [PATCH] mm,oom: Use timeout based back off.

From: Andrew Morton
Date: Wed Oct 24 2018 - 18:55:00 EST


On Mon, 22 Oct 2018 14:11:10 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> > Michal has been refusing timeout based approach, but I don't think this
> > is something we have to be frayed around the edge about possibility of
> > overlooking races/bugs just because Michal does not want to use timeout.
> > I believe that timeout based back off is the only approach we can use
> > for now.
> >
>
> I've proposed patches that have been running for months in a production
> environment that make the oom killer useful without serially killing many
> processes unnecessarily. At this point, it is *much* easier to just fork
> the oom killer logic rather than continue to invest time into fixing it in
> Linux. That's unfortunate because I'm sure you realize how problematic
> the current implementation is, how abusive it is, and have seen its
> effects yourself. I admire your persistance in trying to fix the issues
> surrounding the oom killer, but have come to the conclusion that forking
> it is a much better use of time.

The oom killer is, I think, fairly standalone and it shouldn't be too
hard to add the infrastructure to make the whole thing pluggable. At
runtime, not at build time.

But it is a last resort - it will result in fragmented effort and
difficult decisions for everyone regarding which should be used.

There has been a lot of heat and noise and confusion and handwaving in
all of this. What we're crying out for is simple testcases which
everyone can run. Find a problem, write the testcase, distribute that.
Develop a solution for that testcase then move on to the next one.