Re: [RFC PATCH v3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too

From: Marcelo Tosatti
Date: Mon Apr 25 2022 - 10:59:16 EST


On Mon, Apr 25, 2022 at 04:06:04PM +0200, Christoph Lameter wrote:
> On Mon, 25 Apr 2022, Peter Zijlstra wrote:
>
> > > Folding the vmstat diffs *always* when entering idle prevents unnecessary
> > > wakeups and processing in the future and also provides more accurate
> > > counters for the VM allowing better decision to be made on reclaim.
> >
> > I'm thinking you're going to find a ton of regressions if you try it
> > though; some workloads go idle *very* shortly, doing all this accounting
> > is going to be counter-productive.
>
> Well there is usually not much to do in terms of accounting.

static int refresh_cpu_vm_stats(bool do_pagesets)
{
struct pglist_data *pgdat;
struct zone *zone;
int i;
int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, };
int changes = 0;

for_each_populated_zone(zone) {
struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats;
#ifdef CONFIG_NUMA
struct per_cpu_pages __percpu *pcp = zone->per_cpu_pageset;
#endif

for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
int v;

v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
if (v) {

This loop is quite heavy. Maybe reducing the data necessary to be read
to a couple of cachelines would improve it considerably.

> If there are
> a lot of updates then it is worthwhile because if the numbers are off too
> much then the VM has trouble assessing its own situation.
>
> It may depend though on how long the idle periods are. Do we have
> statistics on the duration? Always folding the vmstat deltas may also
> increase the length of the idle periods.

"Products such as the Intel® Optane™ SSD DC P4800X series have a read and write
latency of 10 microseconds, compared with a write latency of about 220
microseconds for a typical NAND flash SSD."