Re: Detecting page cache trashing state

From: Johannes Weiner
Date: Mon Sep 18 2017 - 12:34:58 EST


Hi Taras,

On Fri, Sep 15, 2017 at 10:28:30AM -0700, Taras Kondratiuk wrote:
> Quoting Michal Hocko (2017-09-15 07:36:19)
> > On Thu 14-09-17 17:16:27, Taras Kondratiuk wrote:
> > > Has somebody faced similar issue? How are you solving it?
> >
> > Yes this is a pain point for a _long_ time. And we still do not have a
> > good answer upstream. Johannes has been playing in this area [1].
> > The main problem is that our OOM detection logic is based on the ability
> > to reclaim memory to allocate new memory. And that is pretty much true
> > for the pagecache when you are trashing. So we do not know that
> > basically whole time is spent refaulting the memory back and forth.
> > We do have some refault stats for the page cache but that is not
> > integrated to the oom detection logic because this is really a
> > non-trivial problem to solve without triggering early oom killer
> > invocations.
> >
> > [1] http://lkml.kernel.org/r/20170727153010.23347-1-hannes@xxxxxxxxxxx
>
> Thanks Michal. memdelay looks promising. We will check it.

Great, I'm obviously interested in more users of it :) Please find
attached the latest version of the patch series based on v4.13.

It needs a bit more refactoring in the scheduler bits before
resubmission, but it already contains a couple of fixes and
improvements since the first version I sent out.

Let me know if you need help rebasing to a different kernel version.