Re: [PATCH 7/7] KVM: VMX: Handle NMI VM-Exits in noinstr region

From: Sean Christopherson
Date: Thu Aug 24 2023 - 10:17:18 EST


On Thu, Aug 24, 2023, Like Xu wrote:
> On 13/12/2022 2:09 pm, Sean Christopherson wrote:
> > Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that
> > the NMI is handled prior to leaving the safety of noinstr. Handling the
> > NMI after leaving noinstr exposes the kernel to potential ordering
> > problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc.
> > will unblock NMIs when IRETing back to the faulting instruction.
> (3) In addition, trace_kvm_exit() should ideally appear before the host NMI
> trace logs, which makes it easier to understand.

Ideally, yes, but tracepoints are not remotely noinstr friendly.

> A proposal fix is to delay vmx_do_nmi_irqoff() a little bit, but not a revert move:
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e6849f780dba..1f29b7f22da7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7230,13 +7230,6 @@ static noinstr void vmx_vcpu_enter_exit(struct
> kvm_vcpu *vcpu,
> else
> vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
>
> - if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> - is_nmi(vmx_get_intr_info(vcpu))) {
> - kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> - vmx_do_nmi_irqoff();
> - kvm_after_interrupt(vcpu);
> - }
> -
> guest_state_exit_irqoff();
> }
>
> @@ -7389,6 +7382,13 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> trace_kvm_exit(vcpu, KVM_ISA_VMX);
>
> + if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> + is_nmi(vmx_get_intr_info(vcpu))) {
> + kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> + vmx_do_nmi_irqoff();
> + kvm_after_interrupt(vcpu);
> + }

No, the whole point of doing NMI handling in vmx_vcpu_enter_exit() is so that NMIs
are serviced before instrumentation is enabled.

I think the below is sufficient (untested at this point). Not quite minimal, e.g.
I'm pretty sure there's (currently) no need to snapshot IDT_VECTORING_INFO_FIELD
so early, but I can't think of any reason to wait.

--
From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Thu, 24 Aug 2023 06:49:36 -0700
Subject: [PATCH] KVM: VMX: Refresh available regs and IDT vectoring info
before NMI handling

Reset the mask of available "registers" and refresh the IDT vectoring
info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a
an NMI VM-Exit. One of the "registers" that KVM VMX lazily loads is the
vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception
or NMI" VM-Exits, i.e. is needed to identify NMIs. Clearing the available
registers bitmask after handling NMIs results in KVM querying info from
the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both
missed NMIs and spurious NMIs from the guest's perspective.

Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit
path too, e.g. to guard against similar consumption of stale data. The
field is read on every "normal" VM-Exit, and there's no point in delaying
the inevitable.

Reported-by: Like Xu <like.xu.linux@xxxxxxxxx>
Fixes: 11df586d774f ("KVM: VMX: Handle NMI VM-Exits in noinstr region")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/vmx/vmx.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6849f780dba..d2b78ab7a9f2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7222,13 +7222,20 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
flags);

vcpu->arch.cr2 = native_read_cr2();
+ vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
+
+ vmx->idt_vectoring_info = 0;

vmx_enable_fb_clear(vmx);

- if (unlikely(vmx->fail))
+ if (unlikely(vmx->fail)) {
vmx->exit_reason.full = 0xdead;
- else
- vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+ goto out;
+ }
+
+ vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+ if (likely(!vmx->exit_reason.failed_vmentry))
+ vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);

if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
is_nmi(vmx_get_intr_info(vcpu))) {
@@ -7237,6 +7244,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
kvm_after_interrupt(vcpu);
}

+out:
guest_state_exit_irqoff();
}

@@ -7358,8 +7366,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
loadsegment(es, __USER_DS);
#endif

- vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
-
pt_guest_exit(vmx);

kvm_load_host_xsave_state(vcpu);
@@ -7376,17 +7382,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->nested.nested_run_pending = 0;
}

- vmx->idt_vectoring_info = 0;
-
if (unlikely(vmx->fail))
return EXIT_FASTPATH_NONE;

if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
kvm_machine_check();

- if (likely(!vmx->exit_reason.failed_vmentry))
- vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
-
trace_kvm_exit(vcpu, KVM_ISA_VMX);

if (unlikely(vmx->exit_reason.failed_vmentry))

base-commit: fff2e47e6c3b8050ca26656693caa857e3a8b740
--