Re: [PATCH 2/7] mm, vmscan: add active list aging tracepoint

From: Minchan Kim
Date: Tue Jan 03 2017 - 00:04:00 EST


Hi Michal,

On Fri, Dec 30, 2016 at 05:37:42PM +0100, Michal Hocko wrote:
> On Sat 31-12-16 01:04:56, Minchan Kim wrote:
> [...]
> > > From 5f1bc22ad1e54050b4da3228d68945e70342ebb6 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@xxxxxxxx>
> > > Date: Tue, 27 Dec 2016 13:18:20 +0100
> > > Subject: [PATCH] mm, vmscan: add active list aging tracepoint
> > >
> > > Our reclaim process has several tracepoints to tell us more about how
> > > things are progressing. We are, however, missing a tracepoint to track
> > > active list aging. Introduce mm_vmscan_lru_shrink_active which reports
> >
> > I agree this part.
> >
> > > the number of
> > > - nr_scanned, nr_taken pages to tell us the LRU isolation
> > > effectiveness.
> >
> > I agree nr_taken for knowing shrinking effectiveness but don't
> > agree nr_scanned. If we want to know LRU isolation effectiveness
> > with nr_scanned and nr_taken, isolate_lru_pages will do.
>
> Yes it will. On the other hand the number is there and there is no
> additional overhead, maintenance or otherwise, to provide that number.

You are adding some instructions, how can you imagine it's no overhead?
Let's say whether it's measurable. Although it's not big in particular case,
it would be measurable if everyone start to say like that "it's trivial so
what's the problem adding a few instructions although it was duplicated?"

You already said "LRU isolate effectiveness". It should be done in there,
isolate_lru_pages and we have been. You need another reasons if you want to
add the duplicated work, strongly.

> The inactive counterpart does that for quite some time already. So why

It couldn't be a reason. If it was duplicated in there, it would be
better to fix it rather than adding more duplciated work to match both
sides.

> exactly does that matter? Don't take me wrong but isn't this more on a
> nit picking side than necessary? Or do I just misunderstand your
> concenrs? It is not like we are providing a stable user API as the

My concern is that I don't see what we can get benefit from those
duplicated work. If it doesn't give benefit to us, I don't want to add.
I hope you think another reasonable reasons.

> tracepoint is clearly implementation specific and not something to be
> used for anything other than debugging.

My point is we already had things "LRU isolation effectivness". Namely,
isolate_lru_pages.

>
> > > - nr_rotated pages which tells us that we are hitting referenced
> > > pages which are deactivated. If this is a large part of the
> > > reported nr_deactivated pages then the active list is too small
> >
> > It might be but not exactly. If your goal is to know LRU size, it can be
> > done in get_scan_count. I tend to agree LRU size is helpful for
> > performance analysis because decreased LRU size signals memory shortage
> > then performance drop.
>
> No, I am not really interested in the exact size but rather to allow to
> find whether we are aging the active list too early...

Could you elaborate it more that how we can get active list early aging
with nr_rotated?

Thanks.