Re: [PATCH] KVM: x86/pmu: Return correct value of IA32_PERF_CAPABILITIES for userspace after vCPU has run

From: Sean Christopherson
Date: Wed Mar 13 2024 - 10:42:45 EST


On Wed, Mar 13, 2024, Wei W Wang wrote:
> On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote:
> > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read
> > it after vCPU has already run. Previously, KVM will always return the guest
> > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The
> > guest cached value on default is kvm_caps.supported_perf_cap. However,
> > when userspace sets the value during live migration, the call fails because of
> > the check on X86_FEATURE_PDCM.
>
> Could you point where in the set_msr path that could fail?
> (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and
> kvm_set_msr_common)

The changelog is misleading, it's not the PDCM feature bit, it's the PMU version
check in vmx_set_msr():

case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;

> > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run,
> > KVM should faithfully return correct value to satisify legitimate requests from
> > userspace such as VM suspend/resume and live migrartion. In this case, KVM
> > should return 0 when guest cpuid lacks X86_FEATURE_PDCM.
> Some typos above (satisfy, migration, CPUID)
>
> Seems the description here isn’t aligned to your code below?
> The code below prevents userspace from reading the MSR value (not return 0 as the
> read value) in that case.

Ya.

> >So fix the
> > problem by adding an additional check in vmx_set_msr().
> >
> > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine
> > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is
> > always 0.
> >
> > Cc: Aaron Lewis <aaronlewis@xxxxxxxxxx>
> > Cc: Jim Mattson <jmattson@xxxxxxxxxx>
> > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 40e3780d73ae..6d8667b56091 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
> > struct msr_data *msr_info)
> > msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
> > [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
> > break;
> > + case MSR_IA32_PERF_CAPABILITIES:
> > + /*
> > + * Host VMM should not get potentially invalid MSR value if
> > vCPU
> > + * has already run but guest cpuid lacks the support for the
> > + * MSR.
> > + */
> > + if (msr_info->host_initiated &&
> > + kvm_vcpu_has_run(vcpu) &&
> > + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> > + return 1;

As Wei pointed out, this doesn't match the changelog. And I don't think this is
what we want, at least not in isolation. Making KVM more restrictive on userspace
reads doesn't solve the live migration save/restore issue, and the kvm_vcpu_has_run()
adds yet another flavor of MSR handling.

We discussed this whole MSRs mess at PUCK this morning. I forgot to hit RECORD,
but Paolo took notes and will post them soon.

Going from memory, the plan is to:

1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to
KVM_SET_MSRS.

2. Go with roughly what I proposed in the CET thread (for unsupported MSRS,
read 0 and drop writes of '0')[*].

3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO
(if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero-
initialize the MSRs).

With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads
and writes to PERF_CAPABILITIES, and the common userspace MSR handling will
convert "unsupported" to "success" as appropriate.

[*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@xxxxxxxxxx