Re: [PATCH] KVM: x86/svm/pmu: Set PerfMonV2 global control bits correctly

From: Like Xu
Date: Tue Mar 05 2024 - 22:23:23 EST




On 6/3/2024 10:32 am, Mi, Dapeng wrote:

On 3/6/2024 2:04 AM, Sean Christopherson wrote:
On Tue, Mar 05, 2024, Sean Christopherson wrote:
On Tue, Mar 05, 2024, Like Xu wrote:
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 87cc6c8809ad..f61ce26aeb90 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
   */
  void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
  {
+    struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
      if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
          return;
@@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
       */
      kvm_pmu_reset(vcpu);
-    bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX);
+    bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
      static_call(kvm_x86_pmu_refresh)(vcpu);
+
+    /*
+     * At RESET, both Intel and AMD CPUs set all enable bits for general
+     * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+     * was written for v1 PMUs don't unknowingly leave GP counters disabled
+     * in the global controls).  Emulate that behavior when refreshing the
+     * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+     */
+    if (kvm_pmu_has_perf_global_ctrl(pmu))
+        pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
  }
Doh, this is based on kvm/kvm-uapi, I'll rebase to kvm-x86/next before posting.

I'll also update the changelog to call out that KVM has always clobbered global_ctrl
during PMU refresh, i.e. there is no danger of breaking existing setups by
clobbering a value set by userspace, e.g. during live migration.

Lastly, I'll also update the changelog to call out that KVM *did* actually set
the general purpose counter enable bits in global_ctrl at "RESET" until v6.0,
and that KVM intentionally removed that behavior because of what appears to be
an Intel SDM bug.

Of course, in typical KVM fashion, that old code was also broken in its own way
(the history of this code is a comedy of errors).  Initial vPMU support in commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost*
got it right, but for some reason only set the bits if the guest PMU was
advertised as v1:

         if (pmu->version == 1) {
                 pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
                 return;
         }


Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on
reset") then tried to remedy that goof, but botched things and also enabled the
fixed counters:

         pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
                 (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
         pmu->global_ctrl_mask = ~pmu->global_ctrl;

Which was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite
the pmu->global_ctrl when refreshing") incorrectly removed *everything*.  Very
ironically, that commit came from Like.

Author: Like Xu <likexu@xxxxxxxxxxx>
Date:   Tue May 10 12:44:07 2022 +0800

     KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing
     Assigning a value to pmu->global_ctrl just to set the value of
     pmu->global_ctrl_mask is more readable but does not conform to the
     specification. The value is reset to zero on Power up and Reset but
     stays unchanged on INIT, like most other MSRs.

But wait, it gets even better.  Like wasn't making up that behavior, Intel's SDM
circa December 2022 states that "Global Perf Counter Controls" is '0' at Power-Up
and RESET.  But then the March 2023 SDM rolls out and says

   IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.

So presumably someone at Intel noticed that what their CPUs do and what the
documentation says didn't match.

This reminds me quite a bit of the past, where this kind of thing happened
occasionally (e.g. some pmu events don't count as expected, and ucode
updates often sneak in to change hardware behaviour). Sometimes we have
to rely on hardware behaviour, sometimes we have to trust the documentation
specification, my experience has been to find a balance that is more
favourable to the end-user and to force the hardware vendors to make
more careful documentation and implementation. :D


It's a really long story... thanks for figuring it out.


*sigh* >>