Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding

From: Marc Zyngier
Date: Wed Aug 10 2022 - 09:02:34 EST


On Wed, 10 Aug 2022 09:12:18 +0100,
Eric Auger <eric.auger@xxxxxxxxxx> wrote:
>
> Hi Marc,
>
> On 8/10/22 08:51, Marc Zyngier wrote:
> > On Wed, 10 Aug 2022 00:30:29 +0100,
> > Dmytro Maluka <dmy@xxxxxxxxxxxx> wrote:
> >> On 8/9/22 10:01 PM, Dong, Eddie wrote:
> >>>
> >>>> -----Original Message-----
> >>>> From: Dmytro Maluka <dmy@xxxxxxxxxxxx>
> >>>> Sent: Tuesday, August 9, 2022 12:24 AM
> >>>> To: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean
> >>>> <seanjc@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>;
> >>>> kvm@xxxxxxxxxxxxxxx
> >>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
> >>>> Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
> >>>> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux-
> >>>> kernel@xxxxxxxxxxxxxxx; Eric Auger <eric.auger@xxxxxxxxxx>; Alex
> >>>> Williamson <alex.williamson@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>;
> >>>> Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki
> >>>> <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>;
> >>>> upstream@xxxxxxxxxxxx; Dmitry Torokhov <dtor@xxxxxxxxxx>
> >>>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
> >>>>
> >>>> On 8/9/22 1:26 AM, Dong, Eddie wrote:
> >>>>>> The existing KVM mechanism for forwarding of level-triggered
> >>>>>> interrupts using resample eventfd doesn't work quite correctly in the
> >>>>>> case of interrupts that are handled in a Linux guest as oneshot
> >>>>>> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device
> >>>>>> in its threaded irq handler, i.e. later than it is acked to the
> >>>>>> interrupt controller (EOI at the end of hardirq), not earlier. The
> >>>>>> existing KVM code doesn't take that into account, which results in
> >>>>>> erroneous extra interrupts in the guest caused by premature re-assert of an
> >>>> unacknowledged IRQ by the host.
> >>>>> Interesting... How it behaviors in native side?
> >>>> In native it behaves correctly, since Linux masks such a oneshot interrupt at the
> >>>> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its
> >>>> immediate re-assert, and then unmasks it later, after its threaded irq handler
> >>>> completes.
> >>>>
> >>>> In handle_fasteoi_irq():
> >>>>
> >>>> if (desc->istate & IRQS_ONESHOT)
> >>>> mask_irq(desc);
> >>>>
> >>>> handle_irq_event(desc);
> >>>>
> >>>> cond_unmask_eoi_irq(desc, chip);
> >>>>
> >>>>
> >>>> and later in unmask_threaded_irq():
> >>>>
> >>>> unmask_irq(desc);
> >>>>
> >>>> I also mentioned that in patch #3 description:
> >>>> "Linux keeps such interrupt masked until its threaded handler finishes, to
> >>>> prevent the EOI from re-asserting an unacknowledged interrupt.
> >>> That makes sense. Can you include the full story in cover letter too?
> >> Ok, I will.
> >>
> >>>
> >>>> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't
> >>>> check that the interrupt is still masked in the guest at the moment of EOI.
> >>>> Resamplefd is notified regardless, so vfio prematurely unmasks the host
> >>>> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host
> >>>> and queued for injection to the guest."
> > Sorry to barge in pretty late in the conversation (just been Cc'd on
> > this), but why shouldn't the resamplefd be notified? If there has been
> yeah sorry to get you involved here ;-)

No problem!

> > an EOI, a new level must be made visible to the guest interrupt
> > controller, no matter what the state of the interrupt masking is.
> >
> > Whether this new level is actually *presented* to a vCPU is another
> > matter entirely, and is arguably a problem for the interrupt
> > controller emulation.
>
> FWIU on guest EOI the physical line is still asserted so the pIRQ is
> immediatly re-sampled by the interrupt controller (because the
> resamplefd unmasked the physical IRQ) and recorded as a guest IRQ
> (although it is masked at guest level). When the guest actually unmasks
> the vIRQ we do not get a chance to re-evaluate the physical line level.

Indeed, and maybe this is what should be fixed instead of moving the
resampling point around (I was suggesting something along these lines
in [1]).

We already do this on arm64 for the timer, and it should be easy
enough it generalise to any interrupt backed by the GIC (there is an
in-kernel API to sample the pending state). No idea how that translate
for other architectures though.

M.

[1] https://lore.kernel.org/r/87mtccbie4.wl-maz@xxxxxxxxxx

--
Without deviation from the norm, progress is not possible.