Re: [PATCH v1] cpuidle: teo: Update idle duration estimate when choosing shallower state

From: Peter Zijlstra
Date: Sat Jul 29 2023 - 05:03:08 EST


On Thu, Jul 27, 2023 at 10:12:56PM +0200, Rafael J. Wysocki wrote:
> On Thu, Jul 27, 2023 at 10:05 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > The TEO governor takes CPU utilization into account by refining idle state
> > selection when the utilization is above a certain threshold. The idle state
> > selection is then refined by choosing an idle state shallower than the
> > previously selected one.
> >
> > However, when this is done, the idle duration estimate needs to be updated
> > so as to prevent the scheduler tick from being stopped while the candidate
> > idle state is shallow, which may lead to excessive energy usage if the CPU
> > is not interrupted quickly enough going forward. Moreover, in case the
> > scheduler tick has been stopped already and the new idle duration estimate
> > is too small, the replacement candidate state cannot be used.
> >
> > Modify the relevant code to take the above observations into account.
> >
> > Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> >
> > @Peter: This doesn't attempt to fix the tick stopping problem, it just makes
> > the current behavior consistent.
> >
> > @Anna-Maria: This is likely to basically prevent the tick from being stopped
> > at all if the CPU utilization is above a certain threshold. I'm wondering if
> > your results will be affected by it and in what way.
> >
> > ---
> > drivers/cpuidle/governors/teo.c | 33 ++++++++++++++++++++++++++-------
> > 1 file changed, 26 insertions(+), 7 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/teo.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> > +++ linux-pm/drivers/cpuidle/governors/teo.c
> > @@ -397,13 +397,22 @@ static int teo_select(struct cpuidle_dri
> > * the shallowest non-polling state and exit.
> > */
> > if (drv->state_count < 3 && cpu_data->utilized) {
> > - for (i = 0; i < drv->state_count; ++i) {
> > - if (!dev->states_usage[i].disable &&
> > - !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) {
> > - idx = i;
> > + /*
> > + * If state 0 is enabled and it is not a polling one, select it
> > + * right away and update the idle duration estimate accordingly,
> > + * unless the scheduler tick has been stopped.
> > + */
> > + if (!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) {
> > + s64 span_ns = teo_middle_of_bin(0, drv);
> > +
> > + if (teo_time_ok(span_ns)) {
> > + duration_ns = span_ns;
> > goto end;
> > }
> > }
> > + /* Assume that state 1 is not a polling one and select it. */
>
> Well, I should also check if it is not disabled. Will send a v2 tomorrow.
>
> > + idx = 1;
> > + goto end;
> > }
> >
> > /*
> > @@ -539,10 +548,20 @@ static int teo_select(struct cpuidle_dri
> >
> > /*
> > * If the CPU is being utilized over the threshold, choose a shallower
> > - * non-polling state to improve latency
> > + * non-polling state to improve latency, unless the scheduler tick has
> > + * been stopped already and the shallower state's target residency is
> > + * not sufficiently large.
> > */
> > - if (cpu_data->utilized)
> > - idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true);
> > + if (cpu_data->utilized) {
> > + s64 span_ns;
> > +
> > + i = teo_find_shallower_state(drv, dev, idx, duration_ns, true);
> > + span_ns = teo_middle_of_bin(i, drv);
> > + if (teo_time_ok(span_ns)) {
> > + idx = i;
> > + duration_ns = span_ns;
> > + }
> > + }

So I'm not a huge fan of that utilized thing to begin with.. that feels
like a hack. I think my patch 3 would achieve much the same, because if
busy, you'll have short idles, which will drive the hit+intercept to
favour low states, and voila.

I didn't take it out -- yet -- because I haven't had much time to
evaluate it.

Simply lowering one state at a random busy threshold is duct-tape if
ever I saw some.