Re: [PATCH] KVM: x86/svm/pmu: Set PerfMonV2 global control bits correctly

From: Sean Christopherson
Date: Tue Mar 05 2024 - 13:04:47 EST


On Tue, Mar 05, 2024, Sean Christopherson wrote:
> On Tue, Mar 05, 2024, Like Xu wrote:
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 87cc6c8809ad..f61ce26aeb90 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
> */
> void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
> {
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +
> if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
> return;
>
> @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
> */
> kvm_pmu_reset(vcpu);
>
> - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX);
> + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
> static_call(kvm_x86_pmu_refresh)(vcpu);
> +
> + /*
> + * At RESET, both Intel and AMD CPUs set all enable bits for general
> + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
> + * was written for v1 PMUs don't unknowingly leave GP counters disabled
> + * in the global controls). Emulate that behavior when refreshing the
> + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
> + */
> + if (kvm_pmu_has_perf_global_ctrl(pmu))
> + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
> }

Doh, this is based on kvm/kvm-uapi, I'll rebase to kvm-x86/next before posting.

I'll also update the changelog to call out that KVM has always clobbered global_ctrl
during PMU refresh, i.e. there is no danger of breaking existing setups by
clobbering a value set by userspace, e.g. during live migration.

Lastly, I'll also update the changelog to call out that KVM *did* actually set
the general purpose counter enable bits in global_ctrl at "RESET" until v6.0,
and that KVM intentionally removed that behavior because of what appears to be
an Intel SDM bug.

Of course, in typical KVM fashion, that old code was also broken in its own way
(the history of this code is a comedy of errors). Initial vPMU support in commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost*
got it right, but for some reason only set the bits if the guest PMU was
advertised as v1:

if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}


Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on
reset") then tried to remedy that goof, but botched things and also enabled the
fixed counters:

pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;

Which was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite
the pmu->global_ctrl when refreshing") incorrectly removed *everything*. Very
ironically, that commit came from Like.

Author: Like Xu <likexu@xxxxxxxxxxx>
Date: Tue May 10 12:44:07 2022 +0800

KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing

Assigning a value to pmu->global_ctrl just to set the value of
pmu->global_ctrl_mask is more readable but does not conform to the
specification. The value is reset to zero on Power up and Reset but
stays unchanged on INIT, like most other MSRs.

But wait, it gets even better. Like wasn't making up that behavior, Intel's SDM
circa December 2022 states that "Global Perf Counter Controls" is '0' at Power-Up
and RESET. But then the March 2023 SDM rolls out and says

IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.

So presumably someone at Intel noticed that what their CPUs do and what the
documentation says didn't match.

*sigh*