Re: [PATCH v8 5/5] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too

From: Marcelo Tosatti
Date: Mon Oct 17 2022 - 12:06:21 EST


On Mon, Oct 03, 2022 at 08:44:35PM +0800, Hillf Danton wrote:
> On 26 Sep 2022 10:20:04 +0100 Aaron Tomlin <atomlin@xxxxxxxxxx> wrote:
> > On Sun 2022-09-25 09:05 +0800, Hillf Danton wrote:
> > > On 24 Sep 2022 16:24:41 +0100 Aaron Tomlin <atomlin@xxxxxxxxxx> wrote:
> > > >
> > > > In the context of the idle task and an adaptive-tick mode/or a nohz_full
> > > > CPU, quiet_vmstat() can be called: before stopping the idle tick,
> > > > entering an idle state and on exit. In particular, for the latter case,
> > > > when the idle task is required to reschedule, the idle tick can remain
> > > > stopped and the timer expiration time endless i.e., KTIME_MAX. Now,
> > > > indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat
> > > > counters should be processed to ensure the respective values have been
> > > > reset and folded into the zone specific 'vm_stat[]'. That being said, it
> > > > can only occur when: the idle tick was previously stopped, and
> > > > reprogramming of the timer is not required.
> > > >
> > > > A customer provided some evidence which indicates that the idle tick was
> > > > stopped; albeit, CPU-specific vmstat counters still remained populated.
> > > > Thus one can only assume quiet_vmstat() was not invoked on return to the
> > > > idle loop.
> > >
> > > Why did housekeeping CPUs fail to do their works, with this assumption
> > > put aside?
> >
> > Hi Hillf,
> >
> > I'm not sure I understand your question.
> >
> > In this context, when tick processing is stopped, delayed work is not going
> > to be handled until the CPU exits idle.
>
> Given work canceled because per-CPU pages can be freed remotely from
> housekeeping CPUs (see patch 3/5), what is added here is not needed.
>
> IOW which one is incorrect?
>
> BTW given delayed work is not going to be handled until the CPU exits idle,

Hi Hilf,

The comment on the codebase now is:

void quiet_vmstat(void)
{
if (system_state != SYSTEM_RUNNING)
return;

if (!delayed_work_pending(this_cpu_ptr(&vmstat_work)))
return;

if (!need_update(smp_processor_id()))
return;

/*
* Just refresh counters and do not care about the pending delayed
* vmstat_update. It doesn't fire that often to matter and canceling
* it would be too expensive from this path.
* vmstat_shepherd will take care about that for us.
*/
refresh_cpu_vm_stats(false);
}

However this is incorrect. The pending delayed work is only cancelled
when executed and not requeued from:

static void vmstat_update(struct work_struct *w)
{
if (refresh_cpu_vm_stats(true)) {
/*
* Counters were updated so we expect more updates
* to occur in the future. Keep on running the
* update worker thread.
*/
queue_delayed_work_on(smp_processor_id(), mm_percpu_wq,
this_cpu_ptr(&vmstat_work),
round_jiffies_relative(sysctl_stat_interval));
}
}

Since this patchset changes the synchronization to happen at return to
userspace or entering idle, we do want to cancel that work (which, after
synchronization, is not necessary).

> canceling work is noop in 3/5, despite what the vmstat shepherd does depends
> not on tick.

Canceling work is a not a noop in 3/5: If the work is not cancelled (if 3/5
is dropped), there will be a pending work to be executed, from the kworker thread
on an isolated CPU. Which is undesired for a fully isolated CPU, with no
interruptions.