Re: [PATCH v7 03/19] KVM: x86/pmu: Remove KVM's enumeration of Intel's architectural encodings

From: Sean Christopherson
Date: Wed Nov 08 2023 - 14:35:10 EST


On Wed, Nov 08, 2023, Kan Liang wrote:
> On 2023-11-07 7:31 p.m., Sean Christopherson wrote:
> > @@ -442,8 +396,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 0;
> > }
> >
> > +/*
> > + * Map fixed counter events to architectural general purpose event encodings.
> > + * Perf doesn't provide APIs to allow KVM to directly program a fixed counter,
> > + * and so KVM instead programs the architectural event to effectively request
> > + * the fixed counter. Perf isn't guaranteed to use a fixed counter and may
> > + * instead program the encoding into a general purpose counter, e.g. if a
> > + * different perf_event is already utilizing the requested counter, but the end
> > + * result is the same (ignoring the fact that using a general purpose counter
> > + * will likely exacerbate counter contention).
> > + *
> > + * Note, reference cycles is counted using a perf-defined "psuedo-encoding",
> > + * as there is no architectural general purpose encoding for reference cycles.
>
> It's not the case for the latest Intel platforms anymore. Please see
> ffbe4ab0beda ("perf/x86/intel: Extend the ref-cycles event to GP counters").

Ugh, yeah. But that and should actually be easier to do on top.

> Maybe perf should export .event_map to KVM somehow.

Oh for ***** sake, perf already does export this for KVM. Untested, but the below
should do the trick. If I need to spin another version of this series then I'll
fold it in, otherwise I'll post it as something on top.

There's also an optimization to be had for kvm_pmu_trigger_event(), which incurs
an indirect branch not only every invocation, but on every iteration. I'll post
this one separately.

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5fc5a62af428..a02e13c2e5e6 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -405,25 +405,32 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* different perf_event is already utilizing the requested counter, but the end
* result is the same (ignoring the fact that using a general purpose counter
* will likely exacerbate counter contention).
- *
- * Note, reference cycles is counted using a perf-defined "psuedo-encoding",
- * as there is no architectural general purpose encoding for reference cycles.
*/
static u64 intel_get_fixed_pmc_eventsel(int index)
{
- const struct {
- u8 eventsel;
- u8 unit_mask;
- } fixed_pmc_events[] = {
- [0] = { 0xc0, 0x00 }, /* Instruction Retired / PERF_COUNT_HW_INSTRUCTIONS. */
- [1] = { 0x3c, 0x00 }, /* CPU Cycles/ PERF_COUNT_HW_CPU_CYCLES. */
- [2] = { 0x00, 0x03 }, /* Reference Cycles / PERF_COUNT_HW_REF_CPU_CYCLES*/
+ enum perf_hw_id perf_id;
+ u64 eventsel;
+
+ BUILD_BUG_ON(KVM_PMC_MAX_FIXED != 3);
+
+ switch (index) {
+ case 0:
+ perf_id = PERF_COUNT_HW_INSTRUCTIONS;
+ break;
+ case 1:
+ perf_id = PERF_COUNT_HW_CPU_CYCLES;
+ break;
+ case 2:
+ perf_id = PERF_COUNT_HW_REF_CPU_CYCLES;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return 0;
};

- BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
-
- return (fixed_pmc_events[index].unit_mask << 8) |
- fixed_pmc_events[index].eventsel;
+ eventsel = perf_get_hw_event_config(perf_id);
+ WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
+ return eventsel;
}

static void intel_pmu_refresh(struct kvm_vcpu *vcpu)