Re: [PATCH] KVM: x86/svm/pmu: Set PerfMonV2 global control bits correctly

From: Like Xu
Date: Mon Mar 04 2024 - 21:32:10 EST


On 5/3/2024 3:46 am, Sean Christopherson wrote:
On Mon, Mar 04, 2024, Dapeng Mi wrote:

On 3/1/2024 5:00 PM, Sandipan Das wrote:
On 3/1/2024 2:07 PM, Like Xu wrote:
On 1/3/2024 3:50 pm, Sandipan Das wrote:
With PerfMonV2, a performance monitoring counter will start operating
only when both the PERF_CTLx enable bit as well as the corresponding
PerfCntrGlobalCtl enable bit are set.

When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set
for a guest but the guest kernel does not support PerfMonV2 (such as
kernels older than v5.19), the guest counters do not count since the
PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never
writes to it.
If the vcpu has the PerfMonV2 feature, it should not work the way legacy
PMU does. Users need to use the new driver to operate the new hardware,
don't they ? One practical approach is that the hypervisor should not set
the PerfMonV2 bit for this unpatched 'v5.19' guest.

My understanding is that the legacy method of managing the counters should
still work because the enable bits in PerfCntrGlobalCtl are expected to be
set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl
is set to 0x3f after a system reset. That way, the guest kernel can use either

If so, please add the PPR description here as comments.

Or even better, make that architectural behavior that's documented in the APM.

On the AMD side, we can't even reason that "PerfMonV3" will be compatible with
"PerfMonV2" w/o APM clarification which is a concern for both driver/virt impl.


---
  arch/x86/kvm/svm/pmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b6a7ad4d6914..14709c564d6a 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
      if (pmu->version > 1) {
          pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
          pmu->global_status_mask = pmu->global_ctrl_mask;
+        pmu->global_ctrl = ~pmu->global_ctrl_mask;

It seems to be more easily understand to calculate global_ctrl firstly and
then derive the globol_ctrl_mask (negative logic).

Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be
something like global_ctrl_rsvd_bits), the computation of the reserved bits should
come from the capabilities of the PMU, not from the RESET value.

On the other hand, setting _all_ non-reserved bits will likely do the wrong thing
if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose
counters. But, that's a future theoretical problem, so I'm inclined to vote for
Sandipan's approach.

I suspect that Intel hardware also has this behaviour [*] although guest
kernels using Intel pmu version 1 are pretty much non-existent.

[*] Table 10-1. IA-32 and Intel® 64 Processor States Following Power-up, Reset, or INIT (Contd.)

We need to update the selftest to guard this.


diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index e886300f0f97..7ac9b080aba6 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
kvm_pmu_cap.num_counters_gp);

        if (pmu->version > 1) {
-               pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters)
- 1);
+               pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1;
+               pmu->global_ctrl_mask = ~pmu->global_ctrl;
                pmu->global_status_mask = pmu->global_ctrl_mask;
        }

      }
        pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;