Re: [patch] vmscan: fix zone shrinking exit when scan work is done

From: Mel Gorman
Date: Wed Feb 09 2011 - 11:47:27 EST


On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote:
> Hi,
>
> I think this should fix the problem of processes getting stuck in
> reclaim that has been reported several times.

I don't think it's the only source but I'm basing this on seeing
constant looping in balance_pgdat() and calling congestion_wait() a few
weeks ago that I haven't rechecked since. However, this looks like a
real fix for a real problem.

> Kent actually
> single-stepped through this code and noted that it was never exiting
> shrink_zone(), which really narrowed it down a lot, considering the
> tons of nested loops from the allocator down to the list shrinking.
>
> Hannes
>
> ---
> From: Johannes Weiner <hannes@xxxxxxxxxxx>
> Subject: vmscan: fix zone shrinking exit when scan work is done
>
> '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of
> lumpy reclaim' introduced an indefinite loop in shrink_zone().
>
> It meant to break out of this loop when no pages had been reclaimed
> and not a single page was even scanned. The way it would detect the
> latter is by taking a snapshot of sc->nr_scanned at the beginning of
> the function and comparing it against the new sc->nr_scanned after the
> scan loop. But it would re-iterate without updating that snapshot,
> looping forever if sc->nr_scanned changed at least once since
> shrink_zone() was invoked.
>
> This is not the sole condition that would exit that loop, but it
> requires other processes to change the zone state, as the reclaimer
> that is stuck obviously can not anymore.
>
> This is only happening for higher-order allocations, where reclaim is
> run back to back with compaction.
>
> Reported-by: Michal Hocko <mhocko@xxxxxxx>
> Reported-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>

Well spotted.

Acked-by: Mel Gorman <mel@xxxxxxxxx>

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/