Re: [PATCH 1/2] KVM: x86/pmu: Reset perf_capabilities in vcpu to 0 if PDCM is disabled

From: Sean Christopherson
Date: Wed Jan 24 2024 - 16:26:23 EST


On Wed, Jan 24, 2024, Aaron Lewis wrote:
> On Wed, Jan 24, 2024 at 7:49 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 24, 2024, Mingwei Zhang wrote:
> > > Reset vcpu->arch.perf_capabilities to 0 if PDCM is disabled in guest cpuid.
> > > Without this, there is an issue in live migration. In particular, to
> > > migrate a VM with no PDCM enabled, VMM on the source is able to retrieve a
> > > non-zero value by reading the MSR_IA32_PERF_CAPABILITIES. However, VMM on
> > > the target is unable to set the value. This creates confusions on the user
> > > side.
> > >
> > > Fundamentally, it is because vcpu->arch.perf_capabilities as the cached
> > > value of MSR_IA32_PERF_CAPABILITIES is incorrect, and there is nothing
> > > wrong on the kvm_get_msr_common() which just reads
> > > vcpu->arch.perf_capabilities.
> > >
> > > Fix the issue by adding the reset code in kvm_vcpu_after_set_cpuid(), i.e.
> > > early in VM setup time.
> > >
> > > Cc: Aaron Lewis <aaronlewis@xxxxxxxxxx>
> > > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
> > > ---
> > > arch/x86/kvm/cpuid.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index adba49afb5fe..416bee03c42a 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -369,6 +369,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> > > vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
> > >
> > > + /* Reset MSR_IA32_PERF_CAPABILITIES guest value to 0 if PDCM is off. */
> > > + if (!guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> > > + vcpu->arch.perf_capabilities = 0;
> >
> > No, this is just papering over the underlying bug. KVM shouldn't be stuffing
> > vcpu->arch.perf_capabilities without explicit writes from host userspace. E.g
> > KVM_SET_CPUID{,2} is allowed multiple times, at which point KVM could clobber a
> > host userspace write to MSR_IA32_PERF_CAPABILITIES. It's unlikely any userspace
> > actually does something like that, but KVM overwriting guest state is almost
> > never a good thing.
> >
> > I've been meaning to send a patch for a long time (IIRC, Aaron also ran into this?).
> > KVM needs to simply not stuff vcpu->arch.perf_capabilities. I believe we are
> > already fudging around this in our internal kernels, so I don't think there's a
> > need to carry a hack-a-fix for the destination kernel.
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 27e23714e960..fdef9d706d61 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12116,7 +12116,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >
> > kvm_async_pf_hash_reset(vcpu);
> >
> > - vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
>
> Yeah, that will fix the issue we are seeing. The only thing that's
> not clear to me is if userspace should expect KVM to set this or if
> KVM should expect userspace to set this. How is that generally
> decided?

By "this", you mean the effective RESET value for vcpu->arch.perf_capabilities?
To be consistent with KVM's CPUID module at vCPU creation, which is completely
empty (vCPU has no PMU and no PDCM support) KVM *must* zero
vcpu->arch.perf_capabilities.

If userspace wants a non-zero value, then userspace needs to set CPUID to enable
PDCM and set MSR_IA32_PERF_CAPABILITIES.

MSR_IA32_ARCH_CAPABILITIES is in the same boat, e.g. a vCPU without
X86_FEATURE_ARCH_CAPABILITIES can end up seeing a non-zero MSR value. That too
should be excised.

In a perfect world, KVM would also zero-initialize vcpu->arch.msr_platform_info,
but that one is less obviously broken and also less obviously safe to remove.

commit e53d88af63ab4104e1226b8f9959f1e9903da10b
Author: Jim Mattson <jmattson@xxxxxxxxxx>
AuthorDate: Tue Oct 30 12:20:21 2018 -0700
Commit: Paolo Bonzini <pbonzini@xxxxxxxxxx>
CommitDate: Fri Dec 14 18:00:01 2018 +0100

kvm: x86: Don't modify MSR_PLATFORM_INFO on vCPU reset

If userspace has provided a different value for this MSR (e.g with the
turbo bits set), the userspace-provided value should survive a vCPU
reset. For backwards compatibility, MSR_PLATFORM_INFO is initialized
in kvm_arch_vcpu_setup.

Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
Reviewed-by: Drew Schmitt <dasch@xxxxxxxxxx>
Cc: Abhiroop Dabral <adabral@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

In other words, KVM shouldn't define the vCPU model beyond the absolute bare
minimum that is required by the x86 architecture (as of P6 CPUs, which is more
or less the oldest CPU KVM can reasonably virtualize without carrying useless code).