Re: Loongson (and other $ARCHs?) idle VS timer enqueue

From: Peter Zijlstra
Date: Fri Apr 21 2023 - 10:25:14 EST


On Fri, Apr 21, 2023 at 02:36:52PM +0200, Frederic Weisbecker wrote:
> Hi,
>
> I'm looking at the __arch_cpu_idle() implementation in Loongarch
> and I'm wondering about the rollback code. I don't understand well
> that code but with the help from PeterZ I might have seen something,
> so tell me if I'm wrong: when an interrupt happens within
> __arch_cpu_idle(), handle_vint() rolls back the return value to the
> beginning of __arch_cpu_idle(), so that TIF_NEED_RESCHED is checked
> again. Is that correct?

Loongson copied this crap from MIPS, so they are direct affected too.

> Because if an interrupt fires while in __arch_cpu_idle(), that IRQ
> might enqueue a new timer and that new timer needs to be reprogrammed
> from the main idle loop and just checking TIF_NEED_RESCHED doesn't
> tell about that information.

Notably; this is only relevant to NOHZ, right?

>
> More generally IRQs must _not_ be re-enabled between cpuidle_select()
> (or just tick_nohz_idle_stop_tick() if no cpuidle support) and the
> last halting ASM instruction. If that happens there must be
> a mechanism to cope with that and make sure we return to the main
> idle loop.
>
> If that mechanism has to go through rollback (I wish your arch allows
> you to find a simpler and less error prone mechanism through), then the
> rollback must actually fast forward to after the halting instruction
> so that the main idle loop re-checks the timers. But then
> __arch_cpu_idle() alone is not enough to be part of the fastforward
> section, it has to start before the raw_local_irq_enable() in
> arch_cpu_idle().
>
> Another way to cope with this would be to have:
>
> #define TIF_IDLE_TIMER ...
> #define TIF_IDLE_EXIT (TIF_NEED_RESCHED | TIF_IDLE_TIMER)
>
> And set that from the timer enqueue in idle time and check TIF_IDLE_EXIT
> on idle callback. It depends how many architectures are concerned by this.
> All I know so far is:

The alternative is changing kernel/entry/common.c:irqentry_exit() to add
a nohz callback next to ct_irq_exit(), and have that reprogram the timer
if/when we're in NOHZ mode.

> * mwait based mechanism should be fine if called with IRQs disabled (or
> sti is called right before) but then we must be sure that IRQs have never
> been even temporarily re-enabled between cpuidle_select() and mwait. How
> to detect that kind of mistake?

mwait_idle_with_hints() is unaffected since it will idle with IRQs
disabled and wait on IRQ-pending (without taking the interrupt).

*HOWEVER*

intel_idle_irq() is affected -- except that we only (normally) use that
for very shallow idle states and it won't interact with NOHZ (because we
only disable the tick for deep idle states).

sti_mwait() relies on the STI shadow, just like hlt below.

> * wfi based mechanism look fine, but again we must make sure IRQs have
> never been re-enabled.

Yes, arm64 WFI has interrupts disabled and will wake on IRQ-pending

> * sti;hlt should be fine but again...

Indeed, STI has a shadow where the effect of enabling interrupts lags
one instruction, so the HLT being in that shadow actually still runs
with IRQs disabled. Any interrupt will then hit when HLT is in effect
and wake up.

> * Need to check all other archs
>
> I'm trying to find an automated way to debug this kind of issue but it's not
> easy...

Yeah, too much arch code :/ Easiest might be to check if our idea of
where the timer should be and the hardware agree on IRQ entry or so --
*any* IRQ. That will miss a lot of cases, but at least it's something.