Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke

From: Gowans, James
Date: Thu Jun 01 2023 - 03:25:04 EST


On Wed, 2023-05-31 at 08:00 +0100, Marc Zyngier wrote:
> > Generally it should not be possible for the next interrupt to arrive
> > while the previous handler is still running: the next interrupt should
> > only arrive after the EOI message has been sent and the previous handler
> > has returned.
>
> There is no such message with LPIs. I pointed that out previously.

Arg, thanks, I'll re-word this to:

"Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the CPU will not preempt an
interrupt with another from the same source or same priority."

I hope that's more accurate?

> > This issue was observed specifically on an arm64 system with a GIC-v3
> > handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
> > that the global ITS is responsible for affinity but does not know
> > whether interrupts are pending/running, only the CPU-local redistributor
> > handles the EOI. Hence when the affinity is changed in the ITS, the new
> > CPU's redistributor does not know that the original CPU is still running
> > the handler.
>
> Similar to your previous patch, you don't explain *why* the interrupt
> gets delivered when it is an LPI, and not for any of the other GICv3
> interrupt types. That's an important point.

Right, you pointed out the issue with this sentence too and I missed
updating it. :-/ How about:

"This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the GIC-v3's physical LPIs do not have a global active state. If LPIs
had an active state, then it would not be be able to be retriggered until
the first CPU had issued a deactivation"

>
> >
> > + /*
> > + * When the race descibed above happens, this will resend the interrupt.
> > + */
> > + if (unlikely(desc->istate & IRQS_PENDING))
> > + check_irq_resend(desc, false);
> > +
> > raw_spin_unlock(&desc->lock);
> > return;
> > out:
>
> While I'm glad that you eventually decided to use the resend mechanism
> instead of spinning on the "old" CPU, I still think imposing this
> behaviour on all users without any discrimination is wrong.
>
> Look at what it does if an interrupt is a wake-up source. You'd
> pointlessly requeue the interrupt (bonus points if the irqchip doesn't
> provide a HW-based retrigger mechanism).
>
> I still maintain that this change should only be applied for the
> particular interrupts that *require* it, and not as a blanket change
> affecting everything under the sun. I have proposed such a change in
> the past, feel free to use it or roll your own.

Thanks for the example of where this blanket functionality wouldn't be
desired - I'll re-work this to introduce and use
the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested.

Just one more thing before I post V3: are you okay with doing the resend
here *after* the handler finished running, and using the IRQ_PENDING flag
to know to resend it? Or would you like it to be resent in
the !irq_may_run(desc) block as you suggested?

I have a slight preference to do it after, only when we know it's ready to
be run again, and hence not needed to modify check_irq_resend() to cater
for multiple retries.

JG