Re: [PATCH] KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES

From: Sean Christopherson
Date: Thu Nov 04 2021 - 15:23:04 EST


On Thu, Nov 04, 2021, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@xxxxxxxxxx>
>
> Currently when kvm_update_cpuid_runtime() runs, it assumes that the
> KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
> however, if Hyper-V support is enabled. In this case the KVM leaves will
> be offset.
>
> This patch introdues as new 'kvm_cpuid_base' field into struct
> kvm_vcpu_arch to track the location of the KVM leaves and function
> kvm_update_cpuid_base() (called from kvm_update_cpuid_runtime()) to locate
> the leaves using the 'KVMKVMKVM\0\0\0' signature. Adjustment of
> KVM_CPUID_FEATURES will hence now target the correct leaf.
>
> Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
> ---
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> Cc: Jim Mattson <jmattson@xxxxxxxxxx>
> Cc: Joerg Roedel <joro@xxxxxxxxxx>

scripts/get_maintainer.pl is your friend :-)

> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/cpuid.c | 50 +++++++++++++++++++++++++++++----
> 2 files changed, 46 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 88fce6ab4bbd..21133ffa23e9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -725,6 +725,7 @@ struct kvm_vcpu_arch {
>
> int cpuid_nent;
> struct kvm_cpuid_entry2 *cpuid_entries;
> + u32 kvm_cpuid_base;
>
> u64 reserved_gpa_bits;
> int maxphyaddr;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 2d70edb0f323..2cfb8ec4f570 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -99,11 +99,46 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
> return 0;
> }
>
> +static void kvm_update_cpuid_base(struct kvm_vcpu *vcpu)
> +{
> + u32 function;
> +
> + for (function = 0x40000000; function < 0x40010000; function += 0x100) {

No small part of me wants to turn hypervisor_cpuid_base() into a macro, but that's
probably more pain than gain. But I do think it would be worth providing a macro
to iterate over possible bases and share that with the guest-side code.

> + struct kvm_cpuid_entry2 *best = kvm_find_cpuid_entry(vcpu, function, 0);

Declare "struct kvm_cpuid_entry2 *best" outside of the loop to shorten this line.
I'd also vote to rename "best" to "entry". KVM's "best" terminology is a remnant
of misguided logic that applied Intel's bizarre out-of-range behavior to internal
KVM lookups.

> +
> + if (best) {
> + char signature[12];
> +
> + *(u32 *)&signature[0] = best->ebx;

Just make signature a u32[3], then the casting craziness goes away.

> + *(u32 *)&signature[4] = best->ecx;
> + *(u32 *)&signature[8] = best->edx;
> +
> + if (!memcmp(signature, "KVMKVMKVM\0\0\0", 12))

The "KVMKVMKVM\0\0\0" magic string belongs in a #define that's shared with the
guest-side code. I

> + break;
> + }
> + }
> + vcpu->arch.kvm_cpuid_base = function;

Unconditionally setting kvm_cpuid_base is silly because then kvm_get_cpuid_base()
needs to check multiple "error" values.

E.g. all of the above can be done as:

struct kvm_cpuid_entry2 *entry;
u32 base, signature[3];

vcpu->arch.kvm_cpuid_base = 0;

virt_for_each_possible_hypervisor_base(base) {
entry = kvm_find_cpuid_entry(vcpu, base, 0);
if (!entry)
continue;

signature[0] = entry->ebx;
signature[1] = entry->ecx;
signature[2] = entry->edx;

if (!memcmp(signature, KVM_CPUID_SIG, sizeof(signature))) {
vcpu->arch.kvm_cpuid_base = base;
break;
}
}

> +}
> +
> +static inline bool kvm_get_cpuid_base(struct kvm_vcpu *vcpu, u32 *function)
> +{
> + if (vcpu->arch.kvm_cpuid_base < 0x40000000 ||
> + vcpu->arch.kvm_cpuid_base >= 0x40010000)
> + return false;
> +
> + *function = vcpu->arch.kvm_cpuid_base;
> + return true;

If '0' is the "doesn't exist" value, then this helper goes away.

> +}
> +
> void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
> {
> + u32 base;
> struct kvm_cpuid_entry2 *best;
>
> - best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> + if (!kvm_get_cpuid_base(vcpu, &base))
> + return;

... and then this becomes:

if (!vcpu->arch.kvm_cpuid_base)
return;

Actually, since this is a repated pattern and is likely going to be limited to
getting KVM_CPUID_FEATURES, just add:

struct kvm_find_cpuid_entry kvm_find_kvm_cpuid_features(void)
{
u32 base = vcpu->arch.kvm_cpuid_base;

if (!base)
return NULL;

return kvm_find_cpuid_entry(vcpu, base | KVM_CPUID_FEATURES, 0);
}

and then all of the indentation churn goes away.

> +
> + best = kvm_find_cpuid_entry(vcpu, base + KVM_CPUID_FEATURES, 0);
>
> /*
> * save the feature bitmap to avoid cpuid lookup for every PV
> @@ -116,6 +151,7 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
> void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpuid_entry2 *best;
> + u32 base;
>
> best = kvm_find_cpuid_entry(vcpu, 1, 0);
> if (best) {
> @@ -142,10 +178,14 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
> cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>
> - best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> - if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> - (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> - best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> + kvm_update_cpuid_base(vcpu);

The KVM base doesn't need to be rechecked for runtime updates. Runtime updates
are to handle changes in guest state, e.g. reported XSAVE size in response to a
CR4.OSXSAVE change. The raw CPUID entries themselves cannot change at runtime.
I suspect you did this here because kvm_update_cpuid_runtime() is called before
kvm_vcpu_after_set_cpuid(), but that has the very bad side effect of doing an
_expensive_ lookup on every runtime update, which can get very painful if there's
no KVM_CPUID_FEATURES to be found.

If you include the prep patch (pasted at the bottom), then this can simply be
(note the somewhat silly name; I think it's worth clarifying that it's the
KVM_CPUID_* base that's being updated):

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0c99d2731076..5dd8c26e9f86 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -245,6 +245,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
vcpu->arch.cpuid_entries = e2;
vcpu->arch.cpuid_nent = nent;

+ kvm_update_kvm_cpuid_base(vcpu);
kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);

> +
> + if (kvm_get_cpuid_base(vcpu, &base)) {
> + best = kvm_find_cpuid_entry(vcpu, base + KVM_CPUID_FEATURES, 0);

This is wrong. base will be >0x40000000 and <0x40010000, and KVM_CPUID_FEATURES
is 0x40000001, i.e. this will lookup 0x80000001 for the default base. The '+'
needs to be an '|'.

> + if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> + (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> + best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> + }
>
> if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
> --
> 2.20.1