Re: [PATCH v6 03/20] KVM: x86/pmu: Don't enumerate arch events KVM doesn't support

From: Sean Christopherson
Date: Tue Nov 07 2023 - 12:39:54 EST


On Tue, Nov 07, 2023, Dapeng Mi wrote:
>
> On 11/4/2023 8:41 PM, Jim Mattson wrote:
> > On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > Don't advertise support to userspace for architectural events that KVM
> > > doesn't support, i.e. for "real" events that aren't listed in
> > > intel_pmu_architectural_events. On current hardware, this effectively
> > > means "don't advertise support for Top Down Slots".
> > NR_REAL_INTEL_ARCH_EVENTS is only used in intel_hw_event_available().
> > As discussed (https://lore.kernel.org/kvm/ZUU12-TUR_1cj47u@xxxxxxxxxx/),
> > intel_hw_event_available() should go away.
> >
> > Aside from mapping fixed counters to event selector and unit mask
> > (fixed_pmc_events[]), KVM has no reason to know when a new
> > architectural event is defined.
>
>
> Since intel_hw_event_available() would be removed, it looks the enum
> intel_pmu_architectural_events and intel_arch_events[] array become useless.
> We can directly simply modify current fixed_pmc_events[] array and use it to
> store fixed counter events code and umask.

Yep, I came to the same conclusion. This is what I ended up with yesterday:

/*
* Map fixed counter events to architectural general purpose event encodings.
* Perf doesn't provide APIs to allow KVM to directly program a fixed counter,
* and so KVM instead programs the architectural event to effectively request
* the fixed counter. Perf isn't guaranteed to use a fixed counter and may
* instead program the encoding into a general purpose counter, e.g. if a
* different perf_event is already utilizing the requested counter, but the end
* result is the same (ignoring the fact that using a general purpose counter
* will likely exacerbate counter contention).
*
* Note, reference cycles is counted using a perf-defined "psuedo-encoding",
* there is no architectural general purpose encoding for reference TSC cycles.
*/
static u64 intel_get_fixed_pmc_eventsel(int index)
{
const struct {
u8 eventsel;
u8 unit_mask;
} fixed_pmc_events[] = {
[0] = { 0xc0, 0x00 }, /* Instruction Retired / PERF_COUNT_HW_INSTRUCTIONS. */
[1] = { 0x3c, 0x00 }, /* CPU Cycles/ PERF_COUNT_HW_CPU_CYCLES. */
[2] = { 0x00, 0x03 }, /* Reference TSC Cycles / PERF_COUNT_HW_REF_CPU_CYCLES*/
};

BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);

return (fixed_pmc_events[index].unit_mask << 8) |
fixed_pmc_events[index].eventsel;
}

...

static void intel_pmu_init(struct kvm_vcpu *vcpu)
{
int i;
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

for (i = 0; i < KVM_INTEL_PMC_MAX_GENERIC; i++) {
pmu->gp_counters[i].type = KVM_PMC_GP;
pmu->gp_counters[i].vcpu = vcpu;
pmu->gp_counters[i].idx = i;
pmu->gp_counters[i].current_config = 0;
}

for (i = 0; i < KVM_PMC_MAX_FIXED; i++) {
pmu->fixed_counters[i].type = KVM_PMC_FIXED;
pmu->fixed_counters[i].vcpu = vcpu;
pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
pmu->fixed_counters[i].current_config = 0;
pmu->fixed_counters[i].eventsel = intel_get_fixed_pmc_eventsel(i);
}

lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
lbr_desc->msr_passthrough = false;
}