[PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke

From: James Gowans
Date: Thu Jun 08 2023 - 08:02:16 EST


If interrupts do not have global active states it is possible for
the next interrupt to arrive on a new CPU if an affinity change happens
while the original CPU is still running the handler. This specifically
impacts GIC-v3.

In this series, generic functionality is added to handle_fast_eoi() to
support resending the interrupt when this race happens, and that generic
functionality is enabled specifically for the GIC-v3 which is impacted
by this issue. GIC-v3 uses the handle_fast_eoi() generic handler, hence
that is the handler getting the functionality.

Also adding a bit more details to the IRQD flags docs to help future
readers know when/why flags should be used and what they mean.

== Testing: ==

TL;DR: Run a virt using QEMU on a EC2 R6g.metal host with a ENA device
passed through using VFIO - bounce IRQ affinity between two CPUs. Before
this change an interrupt can get lost and the device stalls; after this
change the interrupt is not lost.

=== Details: ===

Intentionally slow down the IRQ injection a bit, to turn this from a
rare race condition which to something which can easily be flushed out
in testing:

@@ -763,6 +764,7 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = true;
vgic_queue_irq_unlock(kvm, irq, flags);
+ udelay(10);

return 0;
}

Also sprinkle a print to make it clear when the race described here is
hit:

@@ -698,6 +698,7 @@ void handle_fasteoi_irq(struct irq_desc *desc)
* handling the previous one - it may need to be resent.
*/
if (!irq_may_run(desc)) {
+ printk("!irq_may_run %i\n", desc->irq_data.irq);
if (irqd_needs_resend_when_in_progress(&desc->irq_data))
desc->istate |= IRQS_PENDING;
goto out;

Launch QEMU in your favourite way, with an ENA device passed through via
VFIO (VFIO driver re-binding needs to be done before this):

qemu-system-aarch64 -enable-kvm -machine virt,gic_version=3 -device vfio-pci,host=04:00.0 ...

In the VM, generate network traffic to get interrupts flowing:

ping -f -i 0.001 10.0.3.1 > /dev/null

On the host, change affinity of the interrupt around to flush out the race:

while true; do
echo 1 > /proc/irq/71/smp_affinity ; sleep 0.01;
echo 2 > /proc/irq/71/smp_affinity ; sleep 0.01;
done

In host dmesg the printk indicates that the race is hit:

[ 102.215801] !irq_may_run 71
[ 105.426413] !irq_may_run 71
[ 105.586462] !irq_may_run 71

Before this change, an interrupt is lost and this manifests as a driver
watchdog timeout in the guest device driver:

[ 35.124441] ena 0000:00:02.0 enp0s2: Found a Tx that wasn't completed on time,...
...
[ 37.124459] ------------[ cut here ]------------
[ 37.124791] NETDEV WATCHDOG: enp0s2 (ena): transmit queue 0 timed out

After this change, even though the !irq_may_run print is still shown
(indicating that the race is still hit) the driver no longer times out
because the interrupt now gets resent when the race occurs.

James Gowans (3):
genirq: Expand doc for PENDING and REPLAY flags
genirq: fasteoi supports resend on concurrent invoke
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs

drivers/irqchip/irq-gic-v3-its.c | 2 ++
include/linux/irq.h | 13 +++++++++++++
kernel/irq/chip.c | 16 +++++++++++++++-
kernel/irq/debugfs.c | 2 ++
kernel/irq/internals.h | 7 +++++--
5 files changed, 37 insertions(+), 3 deletions(-)


base-commit: 5f63595ebd82f56a2dd36ca013dd7f5ff2e2416a
--
2.25.1