Re: [PATCH] irq: fasteoi handler re-runs on concurrent invoke

From: Yipeng Zou
Date: Tue Apr 18 2023 - 23:08:39 EST

Next message: Martin K. Petersen: "Re: [PATCH] scsi: advansys: disallow ISA-only build"
Previous message: Haifeng Xu: "[PATCH v2 2/2] memcg, oom: remove explicit wakeup in mem_cgroup_oom_synchronize()"
In reply to: Gowans, James: "Re: [PATCH] irq: fasteoi handler re-runs on concurrent invoke"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

在 2023/4/18 18:56, Gowans, James 写道:

On Wed, 2023-04-12 at 14:32 +0100, Marc Zyngier wrote:

1. Do we need to mask the IRQ and then unmask it later? I don't think so
but it's not entirely clear why handle_edge_irq does this anyway; it's
an edge IRQ so not sure why it needs to be masked.

Please measure that cost and weep, specially in the context of
multiple concurrent interrupts serviced by a single ITS (cost of
locking the command queue, of waiting for a full round trip to the ITS
for a couple of commands...).

Fortunately this mask/unmasking would only happen in the rare(ish) cased of the
race condition described here being hit. Exactly the same as
with handle_edge_irq(), the masking and later unmasking would only be done
when irq_may_run() == false due to the race being hit. Considering that this is
a rare occurrence, I think we could stomach the occasional overhead? I was more
asking if it's actually *necessary* to do this masking/unmasking. I'm not sure
it's necessary anyway, hence it wasn't implemented in my patch.

But does it solve anything? At the point where you mask the interrupt,
you already have consumed it. You'd still need to make it pending
somehow, which is what your patch somehow.

I don't really know - the reason I asked the question is that the related
handle_edge_irq() does this mask/unmasking, and I wasn't quite sure why it
did that and hence if we needed to do something similar.
Anyway, let's focus on your patch rather - I think it's more compelling.

Yes. This bothered me too initially, but on reflection I'm not sure it's
actually a problem. One possible issue that came to mind was around CPU
offlining, but in the event that a CPU being offlined was running interrupt
handlers it wouldn't be able to complete the offline anyway until the handlers
were finished, so I don't think this is an issue. Do you see any practical issue
with running the handler once more on the original CPU immediately after the
affinity has been changed?

My take on this is that we put the pressure on the CPU we want to move
away from. I'd rather we put the it on the GIC itself, and use its
Turing-complete powers to force it to redeliver the interrupt at a
more convenient time.

This idea and implementation looks and works great! It may need a few
tweaks; discussing below.

>From c96d2ab37fe273724f1264fba5f4913259875d56 Mon Sep 17 00:00:00 2001
From: Marc Zyngier<maz@xxxxxxxxxx>
Date: Mon, 10 Apr 2023 10:56:32 +0100
Subject: [PATCH] irqchip/gicv3-its: Force resend of LPIs taken while
already
in-progress

Perhaps you can pillage some of my commit message to explain the race here
when you send this patch?

Signed-off-by: Marc Zyngier<maz@xxxxxxxxxx>

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b1b28affb32a..4b2a7cc96eb2 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -223,6 +223,8 @@ struct irq_data {
* irq_chip::irq_set_affinity() when
deactivated.
* IRQD_IRQ_ENABLED_ON_SUSPEND - Interrupt is enabled on suspend by irq
pm if
* irqchip have flag
IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set.
+ * IRQD_RESEND_WHEN_IN_PROGRESS - Interrupt may fire when already in
progress,
+ * needs resending.
*/
enum {
IRQD_TRIGGER_MASK = 0xf,
@@ -249,6 +251,7 @@ enum {
IRQD_HANDLE_ENFORCE_IRQCTX = (1 << 28),
IRQD_AFFINITY_ON_ACTIVATE = (1 << 29),
IRQD_IRQ_ENABLED_ON_SUSPEND = (1 << 30),
+ IRQD_RESEND_WHEN_IN_PROGRESS = (1 << 31),
};

Do we really want a new flag here? I'd be keen to fix this race for all
drivers, not just those who know to set this flag. I think the patch
you're suggesting is pretty close to being safe to enable generally? If so
my preference is for one less config option - just run it always.

static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc871fec..73546ba8bc43 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,8 +692,11 @@ void handle_fasteoi_irq(struct irq_desc *desc)

raw_spin_lock(&desc->lock);

- if (!irq_may_run(desc))
+ if (!irq_may_run(desc)) {
+ if (irqd_needs_resend_when_in_progress(&desc->irq_data))
+ check_irq_resend(desc, true);
goto out;
+ }

This will run check_irq_resend() on the *newly affined* CPU, while the old
one is still running the original handler. AFAICT what will happen is:
check_irq_resend
try_retrigger
irq_chip_retrigger_hierarchy
its_irq_retrigger
... which will cause the ITS to *immediately* re-trigger the IRQ. The
original CPU can still be running the handler in that case.

If that happens, consider what will happen in check_irq_resend:
- first IRQ comes in, successflly runs try_retrigger and sets IRQS_REPLAY.
- it is *immediately* retriggered by ITS, and because the original handler
on the other CPU is still running, comes into check_irq_resend again.
- check_irq_resend now observes that IRQS_REPLAY is set and early outs.
- No more resends, the IRQ is still lost. :-(

Now I admit the failure mode is getting a bit pathological: two re-
triggers while the original handler is still running, but I was able to
hit this on my test machine by intentionally slowing
the handler down by a few dozen micros. Should we cater for this?

I can see two possibilities:
- tweak check_irq_resend() to not early-out in this case but to keep re-
triggering until it eventually runs.
- move the check_irq_resend to only happen later, *after* the original
handler has finished running. This would be very similar to what I
suggested in my original patch, except instead of running a do/while loop,
the code would observe that the pending flag was set again and run
check_irq_resend.

I'm also wondering what will happen for users who don't have the
chip->irq_retrigger callback set and fall back to the tasklet
via irq_sw_resend()... Looks like it will work fine. However if we do my
suggestion and move check_irq_resend to the end of handle_fasteoi_irq then
the tasklet will be scheduled on the old CPU again, which may be sub-
optimal.

JG

Hi James:

    This does have a problem, and I didn't hit this on my test machine because my interrupt

handling would be ended quickly. But this scenario really should be considered.

    However, if the check_irq_resend is executed in the end of the handle_fasteoi_irq, a flag

is needed to identify it. Same as the original IRQS_PENDING function.

    It's just that the do/while loop in our original patch is replaced with check_irq_resend.

    As for the irq_sw_resend that will be followed without implementing chip->irq_retrigger,

I think maybe there will be another problem here: interrupt response latency.

    If an irq_sw_resend is used to trigger a new interrupt, the delay of this interrupt response

is uncertain (tasklet scheduling, etc.), which is important for some devices with low latency

requirements There may also be an impact.

--
Regards,
Yipeng Zou

Next message: Martin K. Petersen: "Re: [PATCH] scsi: advansys: disallow ISA-only build"
Previous message: Haifeng Xu: "[PATCH v2 2/2] memcg, oom: remove explicit wakeup in mem_cgroup_oom_synchronize()"
In reply to: Gowans, James: "Re: [PATCH] irq: fasteoi handler re-runs on concurrent invoke"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]