Re: [PATCH] irq: fasteoi handler re-runs on concurrent invoke

From: Gowans, James
Date: Tue Apr 18 2023 - 06:56:19 EST


On Wed, 2023-04-12 at 14:32 +0100, Marc Zyngier wrote:
>
> > > > 1. Do we need to mask the IRQ and then unmask it later? I don't think so
> > > > but it's not entirely clear why handle_edge_irq does this anyway; it's
> > > > an edge IRQ so not sure why it needs to be masked.
> > >
> > > Please measure that cost and weep, specially in the context of
> > > multiple concurrent interrupts serviced by a single ITS (cost of
> > > locking the command queue, of waiting for a full round trip to the ITS
> > > for a couple of commands...).
> >
> > Fortunately this mask/unmasking would only happen in the rare(ish) cased of the
> > race condition described here being hit. Exactly the same as
> > with handle_edge_irq(), the masking and later unmasking would only be done
> > when irq_may_run() == false due to the race being hit. Considering that this is
> > a rare occurrence, I think we could stomach the occasional overhead? I was more
> > asking if it's actually *necessary* to do this masking/unmasking. I'm not sure
> > it's necessary anyway, hence it wasn't implemented in my patch.
>
> But does it solve anything? At the point where you mask the interrupt,
> you already have consumed it. You'd still need to make it pending
> somehow, which is what your patch somehow.

I don't really know - the reason I asked the question is that the related
handle_edge_irq() does this mask/unmasking, and I wasn't quite sure why it
did that and hence if we needed to do something similar.
Anyway, let's focus on your patch rather - I think it's more compelling.


> > Yes. This bothered me too initially, but on reflection I'm not sure it's
> > actually a problem. One possible issue that came to mind was around CPU
> > offlining, but in the event that a CPU being offlined was running interrupt
> > handlers it wouldn't be able to complete the offline anyway until the handlers
> > were finished, so I don't think this is an issue. Do you see any practical issue
> > with running the handler once more on the original CPU immediately after the
> > affinity has been changed?
>
> My take on this is that we put the pressure on the CPU we want to move
> away from. I'd rather we put the it on the GIC itself, and use its
> Turing-complete powers to force it to redeliver the interrupt at a
> more convenient time.

This idea and implementation looks and works great! It may need a few
tweaks; discussing below.

>
> From c96d2ab37fe273724f1264fba5f4913259875d56 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz@xxxxxxxxxx>
> Date: Mon, 10 Apr 2023 10:56:32 +0100
> Subject: [PATCH] irqchip/gicv3-its: Force resend of LPIs taken while
> already
> in-progress

Perhaps you can pillage some of my commit message to explain the race here
when you send this patch?
>
> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
>
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index b1b28affb32a..4b2a7cc96eb2 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -223,6 +223,8 @@ struct irq_data {
> * irq_chip::irq_set_affinity() when
> deactivated.
> * IRQD_IRQ_ENABLED_ON_SUSPEND - Interrupt is enabled on suspend by irq
> pm if
> * irqchip have flag
> IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set.
> + * IRQD_RESEND_WHEN_IN_PROGRESS - Interrupt may fire when already in
> progress,
> + * needs resending.
> */
> enum {
> IRQD_TRIGGER_MASK = 0xf,
> @@ -249,6 +251,7 @@ enum {
> IRQD_HANDLE_ENFORCE_IRQCTX = (1 << 28),
> IRQD_AFFINITY_ON_ACTIVATE = (1 << 29),
> IRQD_IRQ_ENABLED_ON_SUSPEND = (1 << 30),
> + IRQD_RESEND_WHEN_IN_PROGRESS = (1 << 31),
> };

Do we really want a new flag here? I'd be keen to fix this race for all
drivers, not just those who know to set this flag. I think the patch
you're suggesting is pretty close to being safe to enable generally? If so
my preference is for one less config option - just run it always.

> static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 49e7bc871fec..73546ba8bc43 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -692,8 +692,11 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>
> raw_spin_lock(&desc->lock);
>
> - if (!irq_may_run(desc))
> + if (!irq_may_run(desc)) {
> + if (irqd_needs_resend_when_in_progress(&desc->irq_data))
> + check_irq_resend(desc, true);
> goto out;
> + }

This will run check_irq_resend() on the *newly affined* CPU, while the old
one is still running the original handler. AFAICT what will happen is:
check_irq_resend
try_retrigger
irq_chip_retrigger_hierarchy
its_irq_retrigger
... which will cause the ITS to *immediately* re-trigger the IRQ. The
original CPU can still be running the handler in that case.

If that happens, consider what will happen in check_irq_resend:
- first IRQ comes in, successflly runs try_retrigger and sets IRQS_REPLAY.
- it is *immediately* retriggered by ITS, and because the original handler
on the other CPU is still running, comes into check_irq_resend again.
- check_irq_resend now observes that IRQS_REPLAY is set and early outs.
- No more resends, the IRQ is still lost. :-(

Now I admit the failure mode is getting a bit pathological: two re-
triggers while the original handler is still running, but I was able to
hit this on my test machine by intentionally slowing
the handler down by a few dozen micros. Should we cater for this?

I can see two possibilities:
- tweak check_irq_resend() to not early-out in this case but to keep re-
triggering until it eventually runs.
- move the check_irq_resend to only happen later, *after* the original
handler has finished running. This would be very similar to what I
suggested in my original patch, except instead of running a do/while loop,
the code would observe that the pending flag was set again and run
check_irq_resend.

I'm also wondering what will happen for users who don't have the
chip->irq_retrigger callback set and fall back to the tasklet
via irq_sw_resend()... Looks like it will work fine. However if we do my
suggestion and move check_irq_resend to the end of handle_fasteoi_irq then
the tasklet will be scheduled on the old CPU again, which may be sub-
optimal.

JG