Re: INFO: rcu detected stall in shmem_fault

From: Michal Hocko
Date: Wed Oct 10 2018 - 05:02:43 EST


On Tue 09-10-18 21:11:48, David Rientjes wrote:
> On Wed, 10 Oct 2018, Tetsuo Handa wrote:
>
> > syzbot is hitting RCU stall due to memcg-OOM event.
> > https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64
> >
> > What should we do if memcg-OOM found no killable task because the allocating task
> > was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires
> > (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper
> > OOM header when no eligible victim left") because syzbot was terminating the test
> > upon WARN(1) removed by that commit) is not a good behavior.
> >
>
> Not printing anything would be the obvious solution but the ideal solution
> would probably involve
>
> - adding feedback to the memcg oom killer that there are no killable
> processes,

We already have that - out_of_memory == F

> - adding complete coverage for memcg_oom_recover() in all uncharge paths
> where the oom memcg's page_counter is decremented, and

Could you elaborate?

> - having all processes stall until memcg_oom_recover() is called so
> looping back into try_charge() has a reasonable expectation to succeed.

You cannot stall in the charge path waiting for others to make a forward
progress because we would be back to oom deadlocks when nobody can make
forward progress due to lock dependencies.

Right now we simply force the charge and allow for further progress when
situation like this happen because this shouldn't happen unless the
memcg is misconfigured badly.
--
Michal Hocko
SUSE Labs