Re: [PATCH v4 19/19] KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0

From: Huang, Kai
Date: Mon Jul 24 2023 - 23:51:33 EST


On Fri, 2023-07-21 at 13:18 -0700, Sean Christopherson wrote:
> Bail from vmx_emergency_disable() without processing the list of loaded
> VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be
> impossible for the list to have entries if VMX is already disabled, and
> even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e.
> processing the list is pointless even if it somehow isn't empty.
>
> Assuming no existing KVM bugs, this should be a glorified nop. The
> primary motivation for the change is to avoid having code that looks like
> it does VMCLEAR, but then skips VMXON, which is nonsensical.
>
> Suggested-by: Kai Huang <kai.huang@xxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/kvm/vmx/vmx.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 5d21931842a5..0ef5ede9cb7c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void)
>
> kvm_rebooting = true;
>
> + /*
> + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be
> + * set in task context. If this races with VMX is disabled by an NMI,
> + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to
> + * kvm_rebooting set.
> + */

I am not quite following this comment. IIUC this code path is only called from
NMI context in case of emergency VMX disable. How can it race with "VMX is
disabled by an NMI"? It should be the normal vmx_hardware_disable() may race
with NMI, but not this one?

> + if (!(__read_cr4() & X86_CR4_VMXE))
> + return;
> +
> list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
> loaded_vmcss_on_cpu_link)
> vmcs_clear(v->vmcs);
>
> - if (__read_cr4() & X86_CR4_VMXE)
> - kvm_cpu_vmxoff();
> + kvm_cpu_vmxoff();
> }
>
> static void __loaded_vmcs_clear(void *arg)

Anyway, the actual code change LGTM:

Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>