Re: [RFC][PATCH 1/3] cpuidle: Inject tick boundary state

From: Rafael J. Wysocki
Date: Mon Jul 31 2023 - 06:35:37 EST


On Mon, Jul 31, 2023 at 11:11 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jul 31, 2023 at 10:01:53AM +0200, Rafael J. Wysocki wrote:
> > On Sat, Jul 29, 2023 at 10:44 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Jul 28, 2023 at 05:36:55PM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, Jul 28, 2023 at 5:01 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > In order to facilitate governors that track history in idle-state
> > > > > buckets (TEO) making a useful decision about NOHZ, make sure we have a
> > > > > bucket that counts tick-and-longer.
> > > > >
> > > > > In order to be inclusive of the tick itself -- after all, if we do not
> > > > > disable NOHZ we'll sleep for a full tick, the actual boundary should
> > > > > be just short of a full tick.
> > > > >
> > > > > IOW, when registering the idle-states, add one that is always
> > > > > disabled, just to have a bucket.
> > > >
> > > > This extra bucket can be created in the governor itself, can't it?
> > >
> > > I couldn't find a nice spot for the governor to add idle-states.
> >
> > Well, I've thought this through and recalled a couple of things and my
> > conclusion is that the decision whether or not to stop the tick really
> > depends on the idle state choice.
> >
> > There are three cases:
> >
> > 1. The selected idle state is shallow (that is, its target residency
> > is below the tick period length), but it is not the deepest one.
> > 2. The selected idle state is shallow, but it is the deepest one (or
> > at least the deepest enabled one).
> > 3. The selected idle state is deep (that is, its target residency is
> > above the tick length period).
> >
> > In case 1, the tick should not be stopped so as to prevent the CPU
> > from spending too much time in a suboptimal idle state.
> >
> > In case 3, the tick needs to be stopped, because otherwise the target
> > residency of the selected state would not be met.
> >
> > Case 2 is somewhat a mixed bag, but generally speaking stopping the
> > tick doesn't hurt if the selected idle state is the deepest one,
> > because in that case the governor kind of expects a significant exit
> > latency anyway. If it is not the deepest one (which is disabled),
> > it's better to let the tick tick.
>
> So I agree with 1.
>
> I do not agree with 2. Disabling the tick is costly, doubly so with the
> timer-pull thing, but even today. Simply disabling it because we picked
> the deepest idle state, irrespective of the expected duration is wrong
> as it will incur this significant cost.
>
> With 3 there is the question of how we get the expected sleep duration;
> this is especially important with timer-pull, where we have this
> chicken-and-egg thing.
>
> Notably: tick_nohz_get_sleep_length() wants to know if the tick gets
> disabled

Well, it shouldn't. Or at least it didn't before.

It is expected to produce two values, one with the tick stopped (this
is the return value of the function) and the other with the tick
ticking (this is the one written under the address passed as the arg).
This cannot depend on whether or not the tick will be stopped. Both
are good to know.

Now, I understand that getting these two values may be costly, so
there is an incentive to avoid calling it, but then the governor needs
to figure this out from its crystal ball and so care needs to be taken
to limit the possible damage in case the crystal ball is not right.

> and cpuilde wants to use tick_nohz_get_sleep_length() to
> determine if to disable the tick. This cycle needs to be broken for
> timer-pull.
>
> Hence my proposal to introduce the extra tick state, that allows fixing
> both 2 and 3.

I'm not sure about 3 TBH.

Say there are 2 idle states, one shallow (say its target residency is
10 us) and one deep (say its target residency is T = 2 * TICK_NSEC).

Currently, there are 3 bins in this case, 0 (0 - 10 us), 1 (10 us - T)
and 2 (T - infinity) and the governor will need to check bins 1 and 2
unless it somehow knows that it will use state 0.

Note that all of the events falling into bin 1 will cause the governor
to select state 0 more often, so I don't see how adding one more bin
between 1 and 2 changes this.