Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management

From: Peter Zijlstra
Date: Mon Jul 31 2023 - 08:02:22 EST

Next message: Alexander Usyskin: "[PATCH v2] mtd: fix use-after-free in mtd release"
Previous message: Baolin Wang: "Re: [PATCH 1/8] mm/compaction: avoid missing last page block in section after skip offline sections"
In reply to: Rafael J. Wysocki: "Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management"
Next in thread: Rafael J. Wysocki: "Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jul 31, 2023 at 12:17:27PM +0200, Rafael J. Wysocki wrote:

> Something really simple like:
>
> 1. Check sched_cpu_util() (which is done by teo anyway).
> 2. If that is around 90% of the maximum CPU capacity, select the first
> non-polling idle state and be done (don't stop the tick as my other
> replay earlier today).

So I really don't like using cpu_util() here, yes, 90% is a high number,
but it doesn't say *anything* about the idle duration. Remember, this is
a 32ms window, so 90% of that is 28.8ms.

(not entirely accurate, since it's an exponential average, but that
doesn't change the overal argument, only some of the particulars)

That is, 90% util, at best, says there is no idle longer than 3.2 ms.
But that is still vastly longer than pretty much all residencies. Heck,
that is still 3 ticks worth of HZ=1000 ticks. So 90% util should not
preclude disabling the tick (at HZ=1000).

Now, typically this won't be the case, and at 90% you'll have lots of
small idles adding up to 3.2ms total idle. But the point is, you can't
tell the difference. And as such util is a horrible measure to use for
cpuidle.

> > If we track the tick+ bucket -- as
> > we must in order to say anything useful about it, then we can decide the
> > tick state before (as I do here) calling sleep_length().
> >
> > The timer-pull rework from Anna-Maria unfortunately makes the
> > tick_nohz_get_sleep_length() thing excessively expensive and it really
> > doesn't make sense to call it when we retain the tick.
> >
> > It's all a bit of a chicken-egg situation, cpuidle wants to know when
> > the next timer is, but telling when that is, wants to know if the tick
> > stays. We need to break that somehow -- I propose by not calling it when
> > we know we'll keep the tick.
>
> By selecting a state whose target residency will not be met, we lose
> on both energy and performance, so doing this really should be
> avoided, unless the state is really shallow in which case there may be
> no time for making this consideration.

I'm not sure how that relates to what I propose above. By adding the
tick+ bucket we have more historical information as related to the tick
boundary, how does that make us select states we won't match residency
for?

Next message: Alexander Usyskin: "[PATCH v2] mtd: fix use-after-free in mtd release"
Previous message: Baolin Wang: "Re: [PATCH 1/8] mm/compaction: avoid missing last page block in section after skip offline sections"
In reply to: Rafael J. Wysocki: "Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management"
Next in thread: Rafael J. Wysocki: "Re: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]