Re: [RFC][PATCH 1/3] cpuidle: Inject tick boundary state

From: Rafael J. Wysocki
Date: Mon Jul 31 2023 - 12:55:56 EST


On Mon, Jul 31, 2023 at 1:39 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jul 31, 2023 at 12:35:20PM +0200, Rafael J. Wysocki wrote:
>
> > > So I agree with 1.
> > >
> > > I do not agree with 2. Disabling the tick is costly, doubly so with the
> > > timer-pull thing, but even today. Simply disabling it because we picked
> > > the deepest idle state, irrespective of the expected duration is wrong
> > > as it will incur this significant cost.
> > >
> > > With 3 there is the question of how we get the expected sleep duration;
> > > this is especially important with timer-pull, where we have this
> > > chicken-and-egg thing.
> > >
> > > Notably: tick_nohz_get_sleep_length() wants to know if the tick gets
> > > disabled
> >
> > Well, it shouldn't. Or at least it didn't before.
>
> Correct, this is new in the timer-pull thing.
>
> > It is expected to produce two values, one with the tick stopped (this
> > is the return value of the function) and the other with the tick
> > ticking (this is the one written under the address passed as the arg).
> > This cannot depend on whether or not the tick will be stopped. Both
> > are good to know.
> >
> > Now, I understand that getting these two values may be costly, so
> > there is an incentive to avoid calling it, but then the governor needs
> > to figure this out from its crystal ball and so care needs to be taken
> > to limit the possible damage in case the crystal ball is not right.
>
> If we can get the governor to decide the tick state up-front we can
> avoid a lot of the expensive parts.

I claim that in the vast majority of cases this is the same as
deciding the state.

The case when it is not is when the target residency of the deepest
idle state is below the tick period length and the governor is about
to select that state. According to the data I've seen so far this is
a tiny fraction of all the cases.

> > > and cpuilde wants to use tick_nohz_get_sleep_length() to
> > > determine if to disable the tick. This cycle needs to be broken for
> > > timer-pull.
> > >
> > > Hence my proposal to introduce the extra tick state, that allows fixing
> > > both 2 and 3.
> >
> > I'm not sure about 3 TBH.
> >
> > Say there are 2 idle states, one shallow (say its target residency is
> > 10 us) and one deep (say its target residency is T = 2 * TICK_NSEC).
>
> This is the easy case and that actually 'works' today.

But this is my case 3 which you said you didn't agree with. I don't
see why it needs to be fixed.

> The interesting case is where your deepest state has a target residency that
> is below the tick (because for HZ=100, we have a 10ms tick and pretty
> much all idle states are below that).

What about HZ=1000, though?

> In that case you cannot tell the difference between I'm good to use this
> state and I'm good to disable the tick and still use this state.

No, you don't, but is it really worth the fuss?

The state is high-latency anyway and tick_nohz_get_sleep_length()
needs to be called anyway in that case in order to determine if a
shallower state wouldn't be better. At this point the statistics have
already told the governor otherwise and a misprediction would be a
double loss.

So yes, you can gain performance by avoiding to call
tick_nohz_get_sleep_length(), but then you can also lose it by
selecting a state that is too deep (and that can be determined exactly
with respect to timers).