Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding

From: Dmytro Maluka
Date: Sat Aug 13 2022 - 10:12:47 EST


Hi Rong,

On 8/12/22 12:40 AM, Liu, Rong L wrote:
> Hi Paolo and Dmytro,
>
>> -----Original Message-----
>> From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> Sent: Wednesday, August 10, 2022 11:48 PM
>> To: Dmytro Maluka <dmy@xxxxxxxxxxxx>; Marc Zyngier
>> <maz@xxxxxxxxxx>; eric.auger@xxxxxxxxxx
>> Cc: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean
>> <seanjc@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; Thomas Gleixner
>> <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Borislav
>> Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
>> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux-
>> kernel@xxxxxxxxxxxxxxx; Alex Williamson <alex.williamson@xxxxxxxxxx>;
>> Liu, Rong L <rong.l.liu@xxxxxxxxx>; Zhenyu Wang
>> <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki <tn@xxxxxxxxxxxx>;
>> Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>; upstream@xxxxxxxxxxxx;
>> Dmitry Torokhov <dtor@xxxxxxxxxx>
>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
>>
>> On 8/10/22 19:02, Dmytro Maluka wrote:
>>> 1. If vEOI happens for a masked vIRQ, notify resamplefd as usual,
>>> but also remember this vIRQ as, let's call it, "pending oneshot".
>>>
>
> This is the part always confuses me. In x86 case, for level triggered
> interrupt, even if it is not oneshot, there is still "unmask" and the unmask
> happens in the same sequence as in oneshot interrupt, just timing is different.
> So are you going to differentiate oneshot from "normal" level triggered
> interrupt or not? And there is any situation that vEOI happens for an unmasked
> vIRQ?

We were already talking about it in [1] and before. It still seems to me
that your statement is wrong and that with x86 ioapic, "normal"
level-triggered interrupts normally stay unmasked all the time, and only
EOI is used for interrupt completion. To double-confirm that, I was once
tracing KVM's ioapic_write_indirect() and confirmed that it's not called
when Linux guest is handling a "normal" level-triggered interrupt.

However, it seems that even if you were right and for normal interrupts
an EOI was always followed by an unmask, this proposal would still work
correctly.

>
> > > 2. A new physical IRQ is immediately generated, so the vIRQ is
>>> properly set as pending.
>>>
>
> I am not sure this is always the case. For example, a device may not raise a
> new interrupt until it is notified that "done reading" - by device driver
> writing to a register or something when device driver finishes reading data. So
> how do you handle this situation?

Right, the device will not raise new interrupts, but also it will not
lower the currently pending interrupt until "done reading". Precisely
for this reason the host will receive a new interrupt immediately after
vfio unmasks the physical IRQ.

It's also possible that the driver will notify "done reading" quite
early, so the device will lower the interrupt before vfio unmasks it, so
no new physical interrupt will be generated, - and that is fine too,
since it means that the physical IRQ is no longer pending, so we don't
need to notify KVM to set the virtual IRQ status to "pending".

>
>>> 3. After the vIRQ is unmasked by the guest, check and find out that
>>> it is not just pending but also "pending oneshot", so don't
>>> deliver it to a vCPU. Instead, immediately notify resamplefd once
>>> again.
>>>
>
> Does this mean the change of vfio code also? That seems the case: vfio seems
> keeping its own internal "state" whether the irq is enabled or not.

I don't quite get why would it require changing vfio. Could you
elaborate?

[1] https://lore.kernel.org/kvm/9054d9f9-f41e-05c7-ce8d-628a6c827c40@xxxxxxxxxxxx/

Thanks,
Dmytro

>
> Thanks,
>
> Rong
>>> In other words, don't avoid extra physical interrupts in the host
>>> (rather, use those extra interrupts for properly updating the pending
>>> state of the vIRQ) but avoid propagating those extra interrupts to the
>>> guest.
>>>
>>> Does this sound reasonable to you?
>>
>> Yeah, this makes sense and it lets the resamplefd set the "pending"
>> status in the vGIC. It still has the issue that the interrupt can
>> remain pending in the guest for longer than it's pending on the host,
>> but that can't be fixed?
>>
>> Paolo
>