Re: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke

From: Gowans, James
Date: Fri Jun 16 2023 - 04:32:41 EST


Hi Marc and Tomas,
Just a ping on this series; would be great to get any more feedback, or
get this merged.

Thanks!
James

On Thu, 2023-06-08 at 14:00 +0200, James Gowans wrote:
> If interrupts do not have global active states it is possible for
> the next interrupt to arrive on a new CPU if an affinity change happens
> while the original CPU is still running the handler. This specifically
> impacts GIC-v3.
>
> In this series, generic functionality is added to handle_fast_eoi() to
> support resending the interrupt when this race happens, and that generic
> functionality is enabled specifically for the GIC-v3 which is impacted
> by this issue. GIC-v3 uses the handle_fast_eoi() generic handler, hence
> that is the handler getting the functionality.
>
> Also adding a bit more details to the IRQD flags docs to help future
> readers know when/why flags should be used and what they mean.
>
> == Testing: ==
>
> TL;DR: Run a virt using QEMU on a EC2 R6g.metal host with a ENA device
> passed through using VFIO - bounce IRQ affinity between two CPUs. Before
> this change an interrupt can get lost and the device stalls; after this
> change the interrupt is not lost.
>
> === Details: ===
>
> Intentionally slow down the IRQ injection a bit, to turn this from a
> rare race condition which to something which can easily be flushed out
> in testing:
>
> @@ -763,6 +764,7 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
> raw_spin_lock_irqsave(&irq->irq_lock, flags);
> irq->pending_latch = true;
> vgic_queue_irq_unlock(kvm, irq, flags);
> + udelay(10);
>
> return 0;
> }
>
> Also sprinkle a print to make it clear when the race described here is
> hit:
>
> @@ -698,6 +698,7 @@ void handle_fasteoi_irq(struct irq_desc *desc)
> * handling the previous one - it may need to be resent.
> */
> if (!irq_may_run(desc)) {
> + printk("!irq_may_run %i\n", desc->irq_data.irq);
> if (irqd_needs_resend_when_in_progress(&desc->irq_data))
> desc->istate |= IRQS_PENDING;
> goto out;
>
> Launch QEMU in your favourite way, with an ENA device passed through via
> VFIO (VFIO driver re-binding needs to be done before this):
>
> qemu-system-aarch64 -enable-kvm -machine virt,gic_version=3 -device vfio-pci,host=04:00.0 ...
>
> In the VM, generate network traffic to get interrupts flowing:
>
> ping -f -i 0.001 10.0.3.1 > /dev/null
>
> On the host, change affinity of the interrupt around to flush out the race:
>
> while true; do
> echo 1 > /proc/irq/71/smp_affinity ; sleep 0.01;
> echo 2 > /proc/irq/71/smp_affinity ; sleep 0.01;
> done
>
> In host dmesg the printk indicates that the race is hit:
>
> [ 102.215801] !irq_may_run 71
> [ 105.426413] !irq_may_run 71
> [ 105.586462] !irq_may_run 71
>
> Before this change, an interrupt is lost and this manifests as a driver
> watchdog timeout in the guest device driver:
>
> [ 35.124441] ena 0000:00:02.0 enp0s2: Found a Tx that wasn't completed on time,...
> ...
> [ 37.124459] ------------[ cut here ]------------
> [ 37.124791] NETDEV WATCHDOG: enp0s2 (ena): transmit queue 0 timed out
>
> After this change, even though the !irq_may_run print is still shown
> (indicating that the race is still hit) the driver no longer times out
> because the interrupt now gets resent when the race occurs.
>
> James Gowans (3):
> genirq: Expand doc for PENDING and REPLAY flags
> genirq: fasteoi supports resend on concurrent invoke
> irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
>
> drivers/irqchip/irq-gic-v3-its.c | 2 ++
> include/linux/irq.h | 13 +++++++++++++
> kernel/irq/chip.c | 16 +++++++++++++++-
> kernel/irq/debugfs.c | 2 ++
> kernel/irq/internals.h | 7 +++++--
> 5 files changed, 37 insertions(+), 3 deletions(-)
>
>
> base-commit: 5f63595ebd82f56a2dd36ca013dd7f5ff2e2416a