Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS

From: Vitaly Kuznetsov
Date: Fri Aug 19 2022 - 04:07:14 EST


Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> Enlightened VMCS v1 got updated and now includes the required fields
>> for TSC scaling and loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT
>> features. For Hyper-V on KVM enablement, KVM can just observe VMX control
>> MSRs and use the features (with or without eVMCS) when
>> possible. Hyper-V on KVM case is trickier because of live migration:
>> the new features require explicit enablement from VMM to not break
>> it. Luckily, the updated eVMCS revision comes with a feature bit in
>> CPUID.0x4000000A.EBX.
>
> The refactor to implement the 2-d array should go in a prep patch. Unless you
> (or anyone else) objects to the feedback below, I'll do the split myself, post v6,
> and queue this up (without patch 1, and assuming there're no major issues in the
> back half of the series).

No objections from me, thanks!

>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/hyperv.c | 2 +-
>> arch/x86/kvm/vmx/evmcs.c | 88 +++++++++++++++++++++++++++++++--------
>> arch/x86/kvm/vmx/evmcs.h | 17 ++------
>> arch/x86/kvm/vmx/nested.c | 2 +-
>> arch/x86/kvm/vmx/vmx.c | 2 +-
>> 5 files changed, 78 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 1098915360ae..8a2b24f9bbf6 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -2552,7 +2552,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>> case HYPERV_CPUID_NESTED_FEATURES:
>> ent->eax = evmcs_ver;
>> ent->eax |= HV_X64_NESTED_MSR_BITMAP;
>> -
>> + ent->ebx |= HV_X64_NESTED_EVMCS1_2022_UPDATE;
>> break;
>>
>> case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 8bea5dea0341..e8497f9854a1 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -368,7 +368,60 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>> return 0;
>> }
>>
>> -void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>> +enum evmcs_revision {
>> + EVMCSv1_2016,
>> + EVMCSv1_2022,
>> + EVMCS_REVISION_MAX,
>
> "MAX" is technically wrong, the "max" entry is usually the last valid entry. This
> should be NR_EVMCS_REVISIONS, and then NR_EVMCS_CTRLS below.
>
>> +};
>> +
>> +enum evmcs_unsupported_ctrl_type {
>
> I think drop the "unsupported" here, because the naming gets weird when trying to
> use it for something other than indexing evmcs_unsupported_ctls (see bottom).
>
>> + EVMCS_EXIT_CTLS,
>> + EVMCS_ENTRY_CTLS,
>
> s/CTLS/CTRLS for consistency.
>
>> + EVMCS_2NDEXEC,
>> + EVMCS_PINCTRL,
>> + EVMCS_VMFUNC,
>> + EVMCS_CTRL_MAX,
>> +};
>> +
>> +static u32 evmcs_unsupported_ctls[EVMCS_CTRL_MAX][EVMCS_REVISION_MAX] = {
>
> This can be "const". And s/ctls/ctrls for consistency.
>
>> + [EVMCS_EXIT_CTLS] = {
>> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,
>> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
>> + },
>> + [EVMCS_ENTRY_CTLS] = {
>> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
>> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
>> + },
>> + [EVMCS_2NDEXEC] = {
>> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_2NDEXEC | SECONDARY_EXEC_TSC_SCALING,
>> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_2NDEXEC,
>> + },
>> + [EVMCS_PINCTRL] = {
>> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_PINCTRL,
>> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_PINCTRL,
>> + },
>> + [EVMCS_VMFUNC] = {
>> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMFUNC,
>> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMFUNC,
>> + },
>> +};
>> +
>> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
>> + enum evmcs_unsupported_ctrl_type ctrl_type)
>> +{
>> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> + enum evmcs_revision evmcs_rev = EVMCSv1_2016;
>> +
>> + if (!hv_vcpu)
>
> This is a functiontal change, and I don't think it's correct either. Previously,
> KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
> vcpu->arch.hyperv is non-NULL. nested_enable_evmcs() doesn't require a Hyper-V
> vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.

Indeed, this *is* correct after PATCH11 when we get rid of VMX feature
MSR filtering for KVM-on-Hyper-V as the remaining use for
evmcs_get_unsupported_ctls() is Hyper-V on KVM and hv_vcpu is not NULL
there. As of this patch, this is incorrect as we're breaking e.g. Linux
guests on KVM on Hyper-V.

>
> So I believe this should be:
>
> if (hv_vcpu &&
> hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
> evmcs_rev = EVMCSv1_2022;

This looks good to me, thanks!

>
>> + return 0;
>> +
>> + if (hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
>> + evmcs_rev = EVMCSv1_2022;
>> +
>> + return evmcs_unsupported_ctls[ctrl_type][evmcs_rev];
>> +}
>> +
>
> ...
>
>> -int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
>> +int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> {
>> int ret = 0;
>> u32 unsupp_ctl;
>>
>> unsupp_ctl = vmcs12->pin_based_vm_exec_control &
>> - EVMCS1_UNSUPPORTED_PINCTRL;
>> + evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
>> if (unsupp_ctl) {
>> trace_kvm_nested_vmenter_failed(
>> - "eVMCS: unsupported pin-based VM-execution controls",
>> + "eVMCS: unsupported pin-based VM-execution controls: ",
>> unsupp_ctl);
>
> Egad! This is all broken, at least in theory. The second param to
> trace_kvm_nested_vmenter_failed() is the error code, not the bad value. It mostly
> works because __print_symbolic() falls back to printing the hex value on error,
> but that relies on there being no collisions/matches between unsupp_ctl and the
> set of VMX_VMENTER_INSTRUCTION_ERRORS. E.g. if there's a collision then the
> tracepoint will pretty print a string and cause confusion.
>
> And checking every control even after one fails seems unnecessary subtle. It
> provides marginal benefit while increasing the probability that we screw up and
> squash "ret" to zero. Yeah, it might save some onion peeling, but if that happens
> during production and now during initial development of a feature, then someone
> has much bigger problems to worry about.
>
> Unless there are objections, I'll fold in a patch to yield:
>
> #define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
>
> static bool nested_evmcs_is_valid_controls(enum evmcs_ctrl_type type, u32 val)
> {
> return val & evmcs_get_unsupported_ctls(type);
> }
>
> int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> {
> if (CC(!nested_evmcs_is_valid_controls(EVMCS_PINCTRL,
> vmcs12->pin_based_vm_exec_control)))
> return -EINVAL;
>
> if (CC(!nested_evmcs_is_valid_controls(EVMCS_2NDEXEC,
> vmcs12->secondary_vm_exec_control)))
> return -EINVAL;
>
> if (CC(!nested_evmcs_is_valid_controls(EVMCS_EXIT_CTRLS,
> vmcs12->vm_exit_controls)))
> return -EINVAL;
>
> if (CC(!nested_evmcs_is_valid_controls(EVMCS_ENTRY_CTRLS,
> vmcs12->vm_entry_controls)))
> return -EINVAL;
>
> if (CC(!nested_evmcs_is_valid_controls(EVMCS_VMFUNC,
> vmcs12->vm_function_control)))
> return -EINVAL;
>
> return 0;
> }


Well, it's actually nice to see all the controls where KVM screwed up
without the need to instrument kernel, this way when someone comes with
an issue you can immediately see what's wrong in the trace
log. Honestly, I don't remember these firing outside of my development
environment, your patch to make things correct looks good to me.

--
Vitaly