Re: [PATCH v2] KVM: x86/pmu: Disable all vPMU features support on Intel hybrid CPUs

From: Like Xu
Date: Fri Feb 03 2023 - 05:09:11 EST


On 3/2/2023 2:06 am, Sean Christopherson wrote:
On Thu, Feb 02, 2023, Like Xu wrote:
On 1/2/2023 12:02 am, Sean Christopherson wrote:
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 79988dafb15b..6a3995657e1e 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
/*
* For Intel, only support guest architectural pmu
- * on a host with architectural pmu.
+ * on a non-hybrid host with architectural pmu.
*/
- if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp)
+ if (!kvm_pmu_cap.num_counters_gp ||
+ (is_intel && (!kvm_pmu_cap.version ||
+ boot_cpu_has(X86_FEATURE_HYBRID_CPU))))

Why do this here instead of in perf_get_x86_pmu_capability()[*]? The issue isn't
restricted to Intel CPUs, it just so happens that Intel is the only x86 vendor
that has shipped hybrid CPUs/PMUs. Similarly, it's entirely possible to create a
hybrid CPU with a fully homogeneous PMU. IMO KVM should rely on the PMU's is_hybrid()
and not the generic X86_FEATURE_HYBRID_CPU flag.

[*] https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@xxxxxxxxxx

As of today, other x86 vendors do not have hybrid core products in their
road maps. Before implementing the virtual hybrid vCPU model, there is
no practical value in talking about homogeneous PMU on hybrid vCPU
at the present stage.

Why not? I assume Intel put a fair bit of effort into ensuring feature parity
between P and E cores. Other than time, money, and effort, I don't see any
reason why Intel can't do the same for the PMU.

I asked the same question when I was last accessed to hyprid core and
was told that it wouldn't happen on pmu capabilities since different pmu
events on different cpu type imply micro-architectural differences between
big and little cores, and even the harmonization of event coding is difficult
to achieve in the short term.


The perf interface only provides host PMU capabilities and the logic for
choosing to disable (or enable) vPMU based on perf input should be left
in the KVM part so that subsequent development work can add most code
to the just KVM, which is very helpful for downstream users to upgrade
loadable KVM module rather than the entire core kernel.

My experience interacting with the perf subsystem has taught me that
perf change required from KVM should be made as small as possible.

I don't disagree, but I don't think that's relevant in this case. Perf doesn't
provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is
somehow able to get away with enumerating a very stripped down vPMU, additional
modifications to perf_get_x86_pmu_capability() will be required.

What I care more about though is this ugliness in perf_get_x86_pmu_capability():

/*
* KVM doesn't support the hybrid PMU yet.
* Return the common value in global x86_pmu,
* which available for all cores.

I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt)
to continue to work on any type of pCPU until you decide to disable them completely.

Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM,
it may be technically ebpf helpers. The diff on comments from v1 can be applied to
this version (restrict KVM semantics), and it makes the status quo clearer to KVM users.

*/
cap->num_counters_gp = x86_pmu.num_counters;

I really don't want to leave that comment lying around as it's flat out wrong in
that it obviously doesn't address the other differences beyond the number of
counters. And since there are dependencies on perf, my preference is to disable
PMU enumeration in perf specifically so that whoever takes on vPMU enabling is
forced to consider the perf side of things, and get buy in from the perf folks.

The perf_get_x86_pmu_capability() obviously needs to be revamped,
but until real effective KVM enabling work arrives, any inconsequential intrusion
into perf/core code will only lead to trivial system maintenance.