Re: [PATCH 1/2] KVM: x86/pmu: Reset perf_capabilities in vcpu to 0 if PDCM is disabled

From: Aaron Lewis
Date: Wed Jan 24 2024 - 16:04:44 EST


On Wed, Jan 24, 2024 at 7:49 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Jan 24, 2024, Mingwei Zhang wrote:
> > Reset vcpu->arch.perf_capabilities to 0 if PDCM is disabled in guest cpuid.
> > Without this, there is an issue in live migration. In particular, to
> > migrate a VM with no PDCM enabled, VMM on the source is able to retrieve a
> > non-zero value by reading the MSR_IA32_PERF_CAPABILITIES. However, VMM on
> > the target is unable to set the value. This creates confusions on the user
> > side.
> >
> > Fundamentally, it is because vcpu->arch.perf_capabilities as the cached
> > value of MSR_IA32_PERF_CAPABILITIES is incorrect, and there is nothing
> > wrong on the kvm_get_msr_common() which just reads
> > vcpu->arch.perf_capabilities.
> >
> > Fix the issue by adding the reset code in kvm_vcpu_after_set_cpuid(), ie.
> > early in VM setup time.
> >
> > Cc: Aaron Lewis <aaronlewis@xxxxxxxxxx>
> > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/cpuid.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index adba49afb5fe..416bee03c42a 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -369,6 +369,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> > vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
> >
> > + /* Reset MSR_IA32_PERF_CAPABILITIES guest value to 0 if PDCM is off. */
> > + if (!guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> > + vcpu->arch.perf_capabilities = 0;
>
> No, this is just papering over the underlying bug. KVM shouldn't be stuffing
> vcpu->arch.perf_capabilities without explicit writes from host userspace. E.g
> KVM_SET_CPUID{,2} is allowed multiple times, at which point KVM could clobber a
> host userspace write to MSR_IA32_PERF_CAPABILITIES. It's unlikely any userspace
> actually does something like that, but KVM overwriting guest state is almost
> never a good thing.
>
> I've been meaning to send a patch for a long time (IIRC, Aaron also ran into this?).
> KVM needs to simply not stuff vcpu->arch.perf_capabilities. I believe we are
> already fudging around this in our internal kernels, so I don't think there's a
> need to carry a hack-a-fix for the destination kernel.
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 27e23714e960..fdef9d706d61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12116,7 +12116,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>
> kvm_async_pf_hash_reset(vcpu);
>
> - vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;

Yeah, that will fix the issue we are seeing. The only thing that's
not clear to me is if userspace should expect KVM to set this or if
KVM should expect userspace to set this. How is that generally
decided?

> kvm_pmu_init(vcpu);
>
> vcpu->arch.pending_external_vector = -1;
>
> > kvm_pmu_refresh(vcpu);
> > vcpu->arch.cr4_guest_rsvd_bits =
> > __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > --
> > 2.43.0.429.g432eaa2c6b-goog
> >