Re: [PATCH v4.4.y] mm, vmstat: make quiet_vmstat lighter

From: Greg KH
Date: Mon May 13 2019 - 03:03:49 EST


On Mon, May 13, 2019 at 08:12:37AM +0200, Daniel Wagner wrote:
> From: Michal Hocko <mhocko@xxxxxxxx>
>
> [ Upstream commit f01f17d3705bb6081c9e5728078f64067982be36 ]
>
> Mike has reported a considerable overhead of refresh_cpu_vm_stats from
> the idle entry during pipe test:
>
> 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12
> 4.75% [kernel] [k] __schedule
> 4.70% [kernel] [k] mutex_unlock
> 3.14% [kernel] [k] __switch_to
>
> This is caused by commit 0eb77e988032 ("vmstat: make vmstat_updater
> deferrable again and shut down on idle") which has placed quiet_vmstat
> into cpu_idle_loop. The main reason here seems to be that the idle
> entry has to get over all zones and perform atomic operations for each
> vmstat entry even though there might be no per cpu diffs. This is a
> pointless overhead for _each_ idle entry.
>
> Make sure that quiet_vmstat is as light as possible.
>
> First of all it doesn't make any sense to do any local sync if the
> current cpu is already set in oncpu_stat_off because vmstat_update puts
> itself there only if there is nothing to do.
>
> Then we can check need_update which should be a cheap way to check for
> potential per-cpu diffs and only then do refresh_cpu_vm_stats.
>
> The original patch also did cancel_delayed_work which we are not doing
> here. There are two reasons for that. Firstly cancel_delayed_work from
> idle context will blow up on RT kernels (reported by Mike):
>
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
> Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> Call Trace:
> dump_stack+0x49/0x67
> ___might_sleep+0xf5/0x180
> rt_spin_lock+0x20/0x50
> try_to_grab_pending+0x69/0x240
> cancel_delayed_work+0x26/0xe0
> quiet_vmstat+0x75/0xa0
> cpu_idle_loop+0x38/0x3e0
> cpu_startup_entry+0x13/0x20
> start_secondary+0x114/0x140
>
> And secondly, even on !RT kernels it might add some non trivial overhead
> which is not necessary. Even if the vmstat worker wakes up and preempts
> idle then it will be most likely a single shot noop because the stats
> were already synced and so it would end up on the oncpu_stat_off anyway.
> We just need to teach both vmstat_shepherd and vmstat_update to stop
> scheduling the worker if there is nothing to do.
>
> [mgalbraith@xxxxxxx: cancel pending work of the cpu_stat_off CPU]
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> Reported-by: Mike Galbraith <umgwanakikbuti@xxxxxxxxx>
> Acked-by: Christoph Lameter <cl@xxxxxxxxx>
> Signed-off-by: Mike Galbraith <mgalbraith@xxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Daniel Wagner <wagi@xxxxxxxxx>
> ---
> Hi Greg,
>
> Upstream commmit 0eb77e988032 ("vmstat: make vmstat_updater deferrable
> again and shut down on idle") was back ported in v4.4.178
> (bdf3c006b9a2). For -rt we definitely need the bugfix f01f17d3705b
> ("mm, vmstat: make quiet_vmstat lighter") as well.
>
> Since the offending patch was back ported to v4.4 stable only, the
> other stable branches don't need an update (offending patch and bug
> fix are already in).
>
> Could you please queue the above patch for v4.4.y?

Now queued up, thanks.

greg k-h