Re: [PATCH v2] KVM: x86/pmu: Disable all vPMU features support on Intel hybrid CPUs

From: Sean Christopherson
Date: Fri Feb 03 2023 - 12:28:25 EST


On Fri, Feb 03, 2023, Like Xu wrote:
> On 3/2/2023 2:06 am, Sean Christopherson wrote:
> > On Thu, Feb 02, 2023, Like Xu wrote:
> > > On 1/2/2023 12:02 am, Sean Christopherson wrote:
> > > The perf interface only provides host PMU capabilities and the logic for
> > > choosing to disable (or enable) vPMU based on perf input should be left
> > > in the KVM part so that subsequent development work can add most code
> > > to the just KVM, which is very helpful for downstream users to upgrade
> > > loadable KVM module rather than the entire core kernel.
> > >
> > > My experience interacting with the perf subsystem has taught me that
> > > perf change required from KVM should be made as small as possible.
> >
> > I don't disagree, but I don't think that's relevant in this case. Perf doesn't
> > provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is
> > somehow able to get away with enumerating a very stripped down vPMU, additional
> > modifications to perf_get_x86_pmu_capability() will be required.
> >
> > What I care more about though is this ugliness in perf_get_x86_pmu_capability():
> >
> > /*
> > * KVM doesn't support the hybrid PMU yet.
> > * Return the common value in global x86_pmu,
> > * which available for all cores.
>
> I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt)
> to continue to work on any type of pCPU until you decide to disable them completely.

Didn't follow this.

> Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM,
> it may be technically ebpf helpers. The diff on comments from v1 can be applied to
> this version (restrict KVM semantics), and it makes the status quo clearer
> to KVM users.

In that case, eBPF is just as hosed, no? And given that the only people that have
touched perf_get_x86_pmu_capability() in its 11+ years of existence are all KVM
people, I have a hard time believing there is meaningful use outside of KVM.

> > */
> > cap->num_counters_gp = x86_pmu.num_counters;
> >
> > I really don't want to leave that comment lying around as it's flat out wrong in
> > that it obviously doesn't address the other differences beyond the number of
> > counters. And since there are dependencies on perf, my preference is to disable
> > PMU enumeration in perf specifically so that whoever takes on vPMU enabling is
> > forced to consider the perf side of things, and get buy in from the perf folks.
>
> The perf_get_x86_pmu_capability() obviously needs to be revamped,
> but until real effective KVM enabling work arrives, any inconsequential intrusion
> into perf/core code will only lead to trivial system maintenance.

Trivial doesn't mean useless or unnecessary though. IMO, there's value in capturing,
in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs.

That said, poking around perf, checking is_hybrid() is wrong. This quirk suggests
that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to
be cleared, and (b) the base PMU will reflect the P-core PMU. I.e. someone can
enable vPMU by disabling E-cores.

/*
* Quirk: For some Alder Lake machine, when all E-cores are disabled in
* a BIOS, the leaf 0xA will enumerate all counters of P-cores. However,
* the X86_FEATURE_HYBRID_CPU is still set. The above codes will
* mistakenly add extra counters for P-cores. Correct the number of
* counters here.
*/
if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) {
pmu->num_counters = x86_pmu.num_counters;
pmu->num_counters_fixed = x86_pmu.num_counters_fixed;
}

Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario
during boot and manually clear X86_FEATURE_HYBRID_CPU.

I'm also ok explicitly disabling support in KVM, but since we need to update
perf as well (that KVM comment needs to go), I don't see any reason not to also
update perf_get_x86_pmu_capability().

How about this? Maybe split over two patches to separate the KVM and perf changes?

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 85a63a41c471..d096b04bf80e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2974,17 +2974,19 @@ unsigned long perf_misc_flags(struct pt_regs *regs)

void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
{
- if (!x86_pmu_initialized()) {
+ /* This API doesn't currently support enumerating hybrid PMUs. */
+ if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) ||
+ !x86_pmu_initialized()) {
memset(cap, 0, sizeof(*cap));
return;
}

+ /*
+ * Note, hybrid CPU models get tracked as having hybrid PMUs even when
+ * all E-cores are disabled via BIOS. When E-cores are disabled, the
+ * base PMU holds the correct number of counters for P-cores.
+ */
cap->version = x86_pmu.version;
- /*
- * KVM doesn't support the hybrid PMU yet.
- * Return the common value in global x86_pmu,
- * which available for all cores.
- */
cap->num_counters_gp = x86_pmu.num_counters;
cap->num_counters_fixed = x86_pmu.num_counters_fixed;
cap->bit_width_gp = x86_pmu.cntval_bits;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cdb91009701d..933165663703 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void)
{
bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;

- perf_get_x86_pmu_capability(&kvm_pmu_cap);
-
- /*
- * For Intel, only support guest architectural pmu
- * on a host with architectural pmu.
- */
- if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp)
+ /*
+ * Hybrid PMUs don't play nice with virtualization unless userspace
+ * pins vCPUs _and_ can enumerate accurate informations to the guest.
+ * Disable vPMU support for hybrid PMUs until KVM gains a way to let
+ * userspace opt into the dangers of hybrid vPMUs.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
enable_pmu = false;

+ if (enable_pmu) {
+ perf_get_x86_pmu_capability(&kvm_pmu_cap);
+
+ /*
+ * For Intel, only support guest architectural pmu
+ * on a host with architectural pmu.
+ */
+ if ((is_intel && !kvm_pmu_cap.version) ||
+ !kvm_pmu_cap.num_counters_gp)
+ enable_pmu = false;
+ }
+
if (!enable_pmu) {
memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
return;