Re: [RFC] respect the referenced bit of KVM guest pages?

From: Wu Fengguang
Date: Sun Aug 16 2009 - 01:41:38 EST


On Sun, Aug 16, 2009 at 01:09:03PM +0800, Balbir Singh wrote:
> * Wu Fengguang <fengguang.wu@xxxxxxxxx> [2009-08-15 13:45:24]:
>
> > On Fri, Aug 14, 2009 at 09:19:35PM +0800, Rik van Riel wrote:
> > > Wu Fengguang wrote:
> > > > On Fri, Aug 14, 2009 at 05:10:55PM +0800, Johannes Weiner wrote:
> > >
> > > >> So even with the active list being a FIFO, we keep usage information
> > > >> gathered from the inactive list. If we deactivate pages in arbitrary
> > > >> list intervals, we throw this away.
> > > >
> > > > We do have the danger of FIFO, if inactive list is small enough, so
> > > > that (unconditionally) deactivated pages quickly get reclaimed and
> > > > their life window in inactive list is too small to be useful.
> > >
> > > This one of the reasons why we unconditionally deactivate
> > > the active anon pages, and do background scanning of the
> > > active anon list when reclaiming page cache pages.
> > >
> > > We want to always move some pages to the inactive anon
> > > list, so it does not get too small.
> >
> > Right, the current code tries to pull inactive list out of
> > smallish-size state as long as there are vmscan activities.
> >
> > However there is a possible (and tricky) hole: mem cgroups
> > don't do batched vmscan. shrink_zone() may call shrink_list()
> > with nr_to_scan=1, in which case shrink_list() _still_ calls
> > isolate_pages() with the much larger SWAP_CLUSTER_MAX.
> >
> > It effectively scales up the inactive list scan rate by 10 times when
> > it is still small, and may thus prevent it from growing up for ever.
> >
>
> I think we need to possibly export some scanning data under DEBUG_VM
> to cross verify.

Maybe we can do more general debugging code, but here is a quick patch
for examining the cgroup case. Note that even for the global zones,
max_scan may well not be the multiple of SWAP_CLUSTER_MAX, thus
shrink_inactive_list() will scan a little more in its last loop.

---
mm/vmscan.c | 7 +++++++
1 file changed, 7 insertions(+)

--- linux.orig/mm/vmscan.c 2009-08-16 13:24:25.000000000 +0800
+++ linux/mm/vmscan.c 2009-08-16 13:38:32.000000000 +0800
@@ -1043,6 +1043,13 @@ static unsigned long shrink_inactive_lis
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
int lumpy_reclaim = 0;

+ if (!scanning_global_lru(sc))
+ printk("shrink inactive %s count=%lu scan=%lu\n",
+ file ? "file" : "anon",
+ mem_cgroup_zone_nr_pages(sc->mem_cgroup, zone,
+ LRU_INACTIVE_ANON + !!file),
+ max_scan);
+
/*
* If we need a large contiguous chunk of memory, or have
* trouble getting a small set of contiguous pages, we
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/