Re: [PATCH v4 19/19] KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0

From: Sean Christopherson
Date: Tue Jul 25 2023 - 14:15:15 EST


On Tue, Jul 25, 2023, Kai Huang wrote:
> On Fri, 2023-07-21 at 13:18 -0700, Sean Christopherson wrote:
> > Bail from vmx_emergency_disable() without processing the list of loaded
> > VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be
> > impossible for the list to have entries if VMX is already disabled, and
> > even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e.
> > processing the list is pointless even if it somehow isn't empty.
> >
> > Assuming no existing KVM bugs, this should be a glorified nop. The
> > primary motivation for the change is to avoid having code that looks like
> > it does VMCLEAR, but then skips VMXON, which is nonsensical.
> >
> > Suggested-by: Kai Huang <kai.huang@xxxxxxxxx>
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 5d21931842a5..0ef5ede9cb7c 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void)
> >
> > kvm_rebooting = true;
> >
> > + /*
> > + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be
> > + * set in task context. If this races with VMX is disabled by an NMI,
> > + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to
> > + * kvm_rebooting set.
> > + */
>
> I am not quite following this comment. IIUC this code path is only called from
> NMI context in case of emergency VMX disable.

The CPU that initiates the emergency reboot can invoke the callback from process
context, only responding CPUs are guaranteed to be handled via NMI shootdown.
E.g. `reboot -f` will reach this point synchronously.

> How can it race with "VMX is disabled by an NMI"?

Somewhat theoretically, a different CPU could panic() and do a shootdown of the
CPU that is handling `reboot -f`.