Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks

From: Tetsuo Handa
Date: Wed Oct 21 2015 - 11:39:11 EST


Michal Hocko wrote:
> On Wed 21-10-15 09:49:07, Christoph Lameter wrote:
> > On Wed, 21 Oct 2015, Michal Hocko wrote:
> >
> > > Because all the WQ workers are stuck somewhere, maybe in the memory
> > > allocation which cannot make any progress and the vmstat update work is
> > > queued behind them.

After invoking the OOM killer, we can easily observe that vmstat_update
cannot be processed due to memory allocation by disk_events_workfn stalls.
http://lkml.kernel.org/r/201509120019.BJI48986.OOSVMJtOLFQHFF@xxxxxxxxxxxxxxxxxxx

I worried that blocking forever from workqueue is an exclusive occupation of
workqueue. In fact, changing to GFP_ATOMIC avoids this problem.
http://lkml.kernel.org/r/201503012017.EAD00571.HOOJVOStMFLFQF@xxxxxxxxxxxxxxxxxxx

Now we realized that we are hitting this problem before invoking the OOM
killer. The situation is similar to the case after the OOM killer is
invoked; there are no reclaimable pages but vmstat_update cannot be
processed. We are caught by a small difference of vmstat counter values.

> > >
> > > At least this is my current understanding.
> >
> > Eww. Maybe need a queue that does not do such evil things as memory
> > allocation?
>
> I am not sure how to achieve that. Requiring non-sleeping worker would
> work out but do we have enough users to add such an API?

If a queue does not need to sleep, can't that queue be processed from
timer context (e.g. mod_timer()) ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/