Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

From: Nils Holland
Date: Mon Dec 26 2016 - 13:57:23 EST


On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > >
> > > Nils, even though this is still highly experimental, could you give it a
> > > try please?
> >
> > Yes, no problem! So I kept the very first patch you sent but had to
> > revert the latest version of the debugging patch (the one in
> > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > memory cgroups enabled again, and the first thing that strikes the eye
> > is that I get this during boot:
> >
> > [ 1.568174] ------------[ cut here ]------------
> > [ 1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > [ 1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
>
> Ohh, I can see what is wrong! a) there is a bug in the accounting in
> my patch (I double account) and b) the detection for the empty list
> cannot work after my change because per node zone will not match per
> zone statistics. The updated patch is below. So I hope my brain already
> works after it's been mostly off last few days...

I tried the updated patch, and I can confirm that the warning during
boot is gone. Also, I've tried my ordinary procedure to reproduce my
testcase, and I can say that a kernel with this new patch also works
fine and doesn't produce OOMs or similar issues.

I had the previous version of the patch in use on a machine non-stop
for the last few days during normal day-to-day workloads and didn't
notice any issues. Now I'll keep a machine running during the next few
days with this patch, and in case I notice something that doesn't look
normal, I'll of course report back!

Greetings
Nils