Re: [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v2

From: Mel Gorman
Date: Thu Jul 28 2016 - 06:28:08 EST


On Thu, Jul 28, 2016 at 03:44:33PM +0900, Joonsoo Kim wrote:
> > To some extent, it could be "addressed" by immediately reclaiming active
> > pages moving to the inactive list at the cost of distorting page age for a
> > workload that is genuinely close to OOM. That is similar to what zone-lru
> > ended up doing -- fast reclaiming young pages from a zone.
>
> My expectation on my test case is that reclaimers should kick out
> actively used page and make a room for 'fork' because parallel readers
> would work even if reading pages are not cached.
>
> It is sensitive on reclaimers efficiency because parallel readers
> read pages repeatedly and disturb reclaim. I thought that it is a
> good test for node-lru which changes reclaimers efficiency for lower
> zone. However, as you said, this efficiency comes from the cost
> distorting page aging so now I'm not sure if it is a problem that we
> need to consider. Let's skip it?
>

I think we should skip it for now. The alterations are too specific to a
test case that is very close to being genuinely OOM. Adjusting timing
for one OOM case may just lead to complains that OOM is detected too
slowly in others.

> Anyway, thanks for tracking down the problem.
>

My pleasure, thanks to both you and Minchan for persisting with this as
we got some important fixes out of the discussion.

--
Mel Gorman
SUSE Labs