Re: Loongson (and other $ARCHs?) idle VS timer enqueue

From: Frederic Weisbecker
Date: Fri Apr 21 2023 - 12:47:37 EST


On Fri, Apr 21, 2023 at 04:24:46PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 21, 2023 at 02:36:52PM +0200, Frederic Weisbecker wrote:
> > Hi,
> >
> > I'm looking at the __arch_cpu_idle() implementation in Loongarch
> > and I'm wondering about the rollback code. I don't understand well
> > that code but with the help from PeterZ I might have seen something,
> > so tell me if I'm wrong: when an interrupt happens within
> > __arch_cpu_idle(), handle_vint() rolls back the return value to the
> > beginning of __arch_cpu_idle(), so that TIF_NEED_RESCHED is checked
> > again. Is that correct?
>
> Loongson copied this crap from MIPS, so they are direct affected too.

Right.

>
> > Because if an interrupt fires while in __arch_cpu_idle(), that IRQ
> > might enqueue a new timer and that new timer needs to be reprogrammed
> > from the main idle loop and just checking TIF_NEED_RESCHED doesn't
> > tell about that information.
>
> Notably; this is only relevant to NOHZ, right?

Indeed.

> > And set that from the timer enqueue in idle time and check TIF_IDLE_EXIT
> > on idle callback. It depends how many architectures are concerned by this.
> > All I know so far is:
>
> The alternative is changing kernel/entry/common.c:irqentry_exit() to add
> a nohz callback next to ct_irq_exit(), and have that reprogram the timer
> if/when we're in NOHZ mode.

We used to do that but Rafael rewrote the thing a few years ago in order for
the cpuidle governor to know about the next timer event as a heuristic to
predict the best c-state, and actually decide if it's worth stopping the
tick.

So cpuidle_select() eventually calls tick_nohz_get_sleep_length() in the
beginning of the idle loop to know the next timer event (without stopping the
tick yet), on top of that and other informations, tick is stopped or not
(cf: stop_tick argument in cpuidle_select()).

If an IRQ wakes up the CPU and queues a timer, we need to go through that
whole process again, otherwise we shortcut cpuidle C-state update.

> *HOWEVER*
>
> intel_idle_irq() is affected -- except that we only (normally) use that
> for very shallow idle states and it won't interact with NOHZ (because we
> only disable the tick for deep idle states).

Well I don't know, that doesn't look comfortable... :)

Also why does it need to enable IRQs if ecx=1 ?

> > * Need to check all other archs
> >
> > I'm trying to find an automated way to debug this kind of issue but it's not
> > easy...
>
> Yeah, too much arch code :/ Easiest might be to check if our idea of
> where the timer should be and the hardware agree on IRQ entry or so --
> *any* IRQ. That will miss a lot of cases, but at least it's something.

Hmm, not sure I understand what you're suggesting...

Thanks.