Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()

From: Oleg Nesterov
Date: Tue May 24 2016 - 18:43:49 EST


On 05/24, Michal Hocko wrote:
>
> On Mon 23-05-16 17:14:19, Oleg Nesterov wrote:
> > On 05/23, Michal Hocko wrote:
> [...]
> > > Could you add some tracing and see what are the numbers
> > > above?
> >
> > with the patch below I can press Ctrl-C when it hangs, this breaks the
> > endless loop and the output looks like
> >
> > vmscan: ZONE=ffffffff8189f180 0 scanned=0 pages=6
> > vmscan: ZONE=ffffffff8189eb00 0 scanned=1 pages=0
> > ...
> > vmscan: ZONE=ffffffff8189eb00 0 scanned=2 pages=1
> > vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > ...
> > vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> >
> > the numbers are always small.
>
> Small but scanned is not 0 and constant which means it either gets reset
> repeatedly (something gets freed) or we have stopped scanning. Which
> pattern can you see? I assume that the swap space is full at the time
> (could you add get_nr_swap_pages() to the output).

no, I tested this without SWAP,

> Also zone->name would
> be better than the pointer.

Yes, forgot to mention, this is DMA32. To remind, only 512m of RAM so
this is natural.

> I am trying to reproduce but your test case always hits the oom killer:

Did you try to run it in a loop? Usually it takes a while before the system
hangs.

> Swap: 138236 57740 80496

perhaps this makes a difference? See above, I have no SWAP.


So. I spent almost the whole day trying to understand whats going on, and
of course I failed.

But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
list because inactive_file_is_low() returns the wrong value. And do not even
ask me why I think so, unlikely I will be able to explain ;) to remind, I never
tried to read vmscan.c before.

But. if I change lruvec_lru_size()

- return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
+ return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);

the problem goes away too.

To remind, it also goes away if I change calculate_normal_threshold() to return
zero, and it was not clear why. Now we can probably conclude that that this is
because the change obviouslt affects lruvec_lru_size().

Oleg.