Re: [RFC][PATCH 1/3] cpuidle: Inject tick boundary state

From: Peter Zijlstra
Date: Mon Jul 31 2023 - 05:12:23 EST


On Mon, Jul 31, 2023 at 10:01:53AM +0200, Rafael J. Wysocki wrote:
> On Sat, Jul 29, 2023 at 10:44 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Jul 28, 2023 at 05:36:55PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Jul 28, 2023 at 5:01 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > In order to facilitate governors that track history in idle-state
> > > > buckets (TEO) making a useful decision about NOHZ, make sure we have a
> > > > bucket that counts tick-and-longer.
> > > >
> > > > In order to be inclusive of the tick itself -- after all, if we do not
> > > > disable NOHZ we'll sleep for a full tick, the actual boundary should
> > > > be just short of a full tick.
> > > >
> > > > IOW, when registering the idle-states, add one that is always
> > > > disabled, just to have a bucket.
> > >
> > > This extra bucket can be created in the governor itself, can't it?
> >
> > I couldn't find a nice spot for the governor to add idle-states.
>
> Well, I've thought this through and recalled a couple of things and my
> conclusion is that the decision whether or not to stop the tick really
> depends on the idle state choice.
>
> There are three cases:
>
> 1. The selected idle state is shallow (that is, its target residency
> is below the tick period length), but it is not the deepest one.
> 2. The selected idle state is shallow, but it is the deepest one (or
> at least the deepest enabled one).
> 3. The selected idle state is deep (that is, its target residency is
> above the tick length period).
>
> In case 1, the tick should not be stopped so as to prevent the CPU
> from spending too much time in a suboptimal idle state.
>
> In case 3, the tick needs to be stopped, because otherwise the target
> residency of the selected state would not be met.
>
> Case 2 is somewhat a mixed bag, but generally speaking stopping the
> tick doesn't hurt if the selected idle state is the deepest one,
> because in that case the governor kind of expects a significant exit
> latency anyway. If it is not the deepest one (which is disabled),
> it's better to let the tick tick.

So I agree with 1.

I do not agree with 2. Disabling the tick is costly, doubly so with the
timer-pull thing, but even today. Simply disabling it because we picked
the deepest idle state, irrespective of the expected duration is wrong
as it will incur this significant cost.

With 3 there is the question of how we get the expected sleep duration;
this is especially important with timer-pull, where we have this
chicken-and-egg thing.

Notably: tick_nohz_get_sleep_length() wants to know if the tick gets
disabled and cpuilde wants to use tick_nohz_get_sleep_length() to
determine if to disable the tick. This cycle needs to be broken for
timer-pull.

Hence my proposal to introduce the extra tick state, that allows fixing
both 2 and 3.