[RFC] mm: bail out in shrin_inactive_list

From: Minchan Kim
Date: Mon Jul 25 2016 - 03:51:48 EST


With node-lru, if there are enough reclaimable pages in highmem
but nothing in lowmem, VM can try to shrink inactive list although
the requested zone is lowmem.

The problem is direct reclaimer scans inactive list is fulled with
highmem pages to find a victim page at a reqested zone or lower zones
but the result is that VM should skip all of pages. It just burns out
CPU. Even, many direct reclaimers are stalled by too_many_isolated
if lots of parallel reclaimer are going on although there are no
reclaimable memory in inactive list.

I tried the experiment 4 times in 32bit 2G 8 CPU KVM machine
to get elapsed time.

hackbench 500 process 2

= Old =

1st: 289s 2nd: 310s 3rd: 112s 4th: 272s

= Now =

1st: 31s 2nd: 132s 3rd: 162s 4th: 50s.

Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
I believe proper fix is to modify get_scan_count. IOW, I think
we should introduce lruvec_reclaimable_lru_size with proper
classzone_idx but I don't know how we can fix it with memcg
which doesn't have zone stat now. should introduce zone stat
back to memcg? Or, it's okay to ignore memcg?

mm/vmscan.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e5af357..3d285cc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1652,6 +1652,31 @@ static int current_may_throttle(void)
bdi_write_congested(current->backing_dev_info);
}

+static inline bool inactive_reclaimable_pages(struct lruvec *lruvec,
+ struct scan_control *sc,
+ enum lru_list lru)
+{
+ int zid;
+ struct zone *zone;
+ bool file = is_file_lru(lru);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ if (!global_reclaim(sc))
+ return true;
+
+ for (zid = sc->reclaim_idx; zid >= 0; zid--) {
+ zone = &pgdat->node_zones[zid];
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_page_state_snapshot(zone, NR_ZONE_LRU_BASE +
+ LRU_FILE * file) >= SWAP_CLUSTER_MAX)
+ return true;
+ }
+
+ return false;
+}
+
/*
* shrink_inactive_list() is a helper for shrink_node(). It returns the number
* of reclaimed pages
@@ -1674,6 +1699,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;

+ if (!inactive_reclaimable_pages(lruvec, sc, lru))
+ return 0;
+
while (unlikely(too_many_isolated(pgdat, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);

--
1.9.1