Re: [PATCHv5 0/8] Virtual NMI feature

From: Sean Christopherson
Date: Wed Nov 16 2022 - 11:59:32 EST


On Wed, Nov 16, 2022, Tom Lendacky wrote:
> On 11/16/22 09:44, Santosh Shukla wrote:
> > Hello Maxim,.
> >
> > On 11/16/2022 2:51 PM, Maxim Levitsky wrote:
> > > On Wed, 2022-11-16 at 11:10 +0530, Santosh Shukla wrote:
> > > > Hi Maxim,
> > > >
> > > > On 11/14/2022 8:01 PM, Maxim Levitsky wrote:
> > > > > On Mon, 2022-11-14 at 13:32 +0530, Santosh Shukla wrote:
> > > > > I started reviewing it today and I think there are still few issues,
> > > > > and the biggest one is that if a NMI arrives while vNMI injection
> > > > > is pending, current code just drops such NMI.

I don't think it gets dropped, just improperly delayed.

> > > > > We had a discussion about this, like forcing immeditate vm exit
> > > >
> > > > I believe, We discussed above case in [1] i.e.. HW can handle
> > > > the second (/pending)virtual NMI while the guest processing first virtual NMI w/o vmexit.
> > > > is it same scenario or different one that you are mentioning?
> > > >
> > > > [1] https://lore.kernel.org/lkml/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@xxxxxxx/>>
> > > You misunderstood the issue.
> > >
> > > Hardware can handle the case when a NMI is in service (that is V_NMI_MASK
> > > is set) and another one is injected (V_NMI_PENDING can be set),
> > >
> > > but it is not possible to handle the case when a NMI is already injected
> > > (V_NMI_PENDING set) but and KVM wants to inject another one before the
> > > first one went into the service (that is V_NMI_MASK is not set yet).
> >
> > In this case, HW will collapse the NMI.

Yes, but practically speaking two NMIs can't arrive at the exact same instance on
bare metal. One NMI will always arrive first and get vectored, and then the second
NMI will arrive and be pended. In a virtual environment, two NMIs that are sent
far apart can arrive together, e.g. if the vCPU is scheduled out for an extended
period of time. KVM mitigates this side effect by allowing two NMIs to be pending.

The problem here isn't that second the NMI is lost, it's that KVM can't get control
when the first NMI completes (IRETs). KVM can pend both NMIs and queue the first
for injection/vectoring (set V_NMI_PENDING), but AIUI there is no mechanism (and no
code) to force a VM-Exit on the IRET so that KVM can inject the second NMI.

Santosh's response in v2[*] suggested that hardware would allow KVM to "post" an
NMI while the vCPU is running, but I don't see any code in this series to actually
do that. svm_inject_nmi() is only called from the vCPU's run loop, i.e. requires
a VM-Exit.

[*] https://lore.kernel.org/all/1782cdbb-8274-8c3d-fa98-29147f1e5d1e@xxxxxxx

> > Note that the HW will take the pending NMI at the boundary of IRET instruction such that
> > it will check for the V_NMI_PENDING and if its set then HW will *take* the NMI,
> > HW will clear the V_NMI_PENDING bit and set the V_NMI_MASK w/o the VMEXIT!,.
> >
> >
> > > Also same can happen when NMIs are blocked in SMM, since V_NMI_MASK is
> > > set despite no NMI in service, we will be able to inject only one NMI by
> > > setting the V_NMI_PENDING.

I believe this one is a non-issue. Like bare metal, KVM only allows one NMI to
be pending if NMIs are blocked.

static void process_nmi(struct kvm_vcpu *vcpu)
{
unsigned limit = 2;

/*
* x86 is limited to one NMI running, and one NMI pending after it.
* If an NMI is already in progress, limit further NMIs to just one.
* Otherwise, allow two (and we'll inject the first one immediately).
*/
if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
limit = 1;

vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
kvm_make_request(KVM_REQ_EVENT, vcpu);
}


> > Ditto,. HW will collapse the NMI.
>
> Note, this is how bare-metal NMIs are also handled. Multiple NMIs are
> collapsed into a single NMI if they are received while an NMI is currently
> being processed.