Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value

From: Sean Christopherson
Date: Thu Nov 10 2022 - 11:08:17 EST


On Thu, Nov 10, 2022, Yu Zhang wrote:
> > > BTW, I found my previous understanding of what vmx_adjust_secondary_exec_control()
> > > currently does was also wrong. It could also be used for EXITING controls. And
> > > for such flags(e.g., SECONDARY_EXEC_RDRAND_EXITING), values for the nested settings
> > > (vmx->nested.msrs.secondary_ctls_high) and for the L1 execution controls(*exec_control)
> > > could be opposite. So the statement:
> > > "1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/
> > > disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes."
> > > is wrong.
> >
> > No, it's correct. The EXITING controls are just inverted feature flags. E.g. if
> > RDRAND is disabled in CPUID, KVM sets the EXITING control so that KVM intercepts
> > RDRAND in order to inject #UD.
> >
> > [EXIT_REASON_RDRAND] = kvm_handle_invalid_op,
> >
>
> Well, suppose
> - cpu_has_vmx_rdrand() is true;
> - meanwhile guest_cpuid_has(vcpu, X86_FEATURE_RDRAND) is false.
>
> And then, what vmx_adjust_secondary_exec_control() currently does is:
> 1> keep the SECONDARY_EXEC_RDRAND_EXITING set in L1 secondary proc-
> based execution control.
> 2> and then clear the SECONDARY_EXEC_RDRAND_EXITING in the high bits
> of IA32_VMX_PROCBASED_CTLS2 MSR for nested by
> vmx->nested.msrs.secondary_ctls_high &= ~control;
> That means for L1 VMM, SECONDARY_EXEC_RDRAND_EXITING must be cleared
> in its(VMCS12's) secondary proc-based VM-execution control, even when
> rdrand is disabled in L1's and L2's CPUID.

Again, it is _userspace's_ responsibility to provide a sane, consistent CPU model
to the guest.

> I wonder, for native environment, if an instruction is not supported,
> will the allowed 1-setting for its corresponding exiting feature in
> IA32_VMX_PROCBASED_CTLS2 MSR be set, or be cleared? Maybe it should
> be cleared, and executing such instruction in non-root will just get
> a #UD directly instead of triggering a VM-Exit?

For any reasonable interpretation of the SDM, it's a moot point. The SDM doesn't
call out these scenarios for instructions like RDTSCP because they're nonsensical,
but for other instructions that can be trapped by the hypervisor and can take a
#UD when they're supported, the #UD is prioritized of the VM-Exit. E.g. VMX
instructions have pseudocode like:

IF not in VMX operation
THEN #UD;
ELSIF in VMX non-root operation
THEN VM exit;

In other words, if the CPU doesn't recognize an instruction, it will generate a
#UD without getting to the (presumed) microcode flow that checks for VM-Exit.