Re: [PATCH v7 03/19] KVM: x86/pmu: Remove KVM's enumeration of Intel's architectural encodings

From: Liang, Kan
Date: Wed Nov 08 2023 - 15:38:24 EST




On 2023-11-08 2:35 p.m., Sean Christopherson wrote:
> On Wed, Nov 08, 2023, Kan Liang wrote:
>> On 2023-11-07 7:31 p.m., Sean Christopherson wrote:
>>> @@ -442,8 +396,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> return 0;
>>> }
>>>
>>> +/*
>>> + * Map fixed counter events to architectural general purpose event encodings.
>>> + * Perf doesn't provide APIs to allow KVM to directly program a fixed counter,
>>> + * and so KVM instead programs the architectural event to effectively request
>>> + * the fixed counter. Perf isn't guaranteed to use a fixed counter and may
>>> + * instead program the encoding into a general purpose counter, e.g. if a
>>> + * different perf_event is already utilizing the requested counter, but the end
>>> + * result is the same (ignoring the fact that using a general purpose counter
>>> + * will likely exacerbate counter contention).
>>> + *
>>> + * Note, reference cycles is counted using a perf-defined "psuedo-encoding",
>>> + * as there is no architectural general purpose encoding for reference cycles.
>>
>> It's not the case for the latest Intel platforms anymore. Please see
>> ffbe4ab0beda ("perf/x86/intel: Extend the ref-cycles event to GP counters").
>
> Ugh, yeah. But that and should actually be easier to do on top.
>
>> Maybe perf should export .event_map to KVM somehow.
>
> Oh for ***** sake, perf already does export this for KVM. Untested, but the below
> should do the trick. If I need to spin another version of this series then I'll
> fold it in, otherwise I'll post it as something on top.
>
> There's also an optimization to be had for kvm_pmu_trigger_event(), which incurs
> an indirect branch not only every invocation, but on every iteration. I'll post
> this one separately.
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 5fc5a62af428..a02e13c2e5e6 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -405,25 +405,32 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> * different perf_event is already utilizing the requested counter, but the end
> * result is the same (ignoring the fact that using a general purpose counter
> * will likely exacerbate counter contention).
> - *
> - * Note, reference cycles is counted using a perf-defined "psuedo-encoding",
> - * as there is no architectural general purpose encoding for reference cycles.
> */
> static u64 intel_get_fixed_pmc_eventsel(int index)
> {
> - const struct {
> - u8 eventsel;
> - u8 unit_mask;
> - } fixed_pmc_events[] = {
> - [0] = { 0xc0, 0x00 }, /* Instruction Retired / PERF_COUNT_HW_INSTRUCTIONS. */
> - [1] = { 0x3c, 0x00 }, /* CPU Cycles/ PERF_COUNT_HW_CPU_CYCLES. */
> - [2] = { 0x00, 0x03 }, /* Reference Cycles / PERF_COUNT_HW_REF_CPU_CYCLES*/
> + enum perf_hw_id perf_id;
> + u64 eventsel;
> +
> + BUILD_BUG_ON(KVM_PMC_MAX_FIXED != 3);
> +
> + switch (index) {
> + case 0:
> + perf_id = PERF_COUNT_HW_INSTRUCTIONS;
> + break;
> + case 1:
> + perf_id = PERF_COUNT_HW_CPU_CYCLES;
> + break;
> + case 2:
> + perf_id = PERF_COUNT_HW_REF_CPU_CYCLES;
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + return 0;
> };
>
> - BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
> -
> - return (fixed_pmc_events[index].unit_mask << 8) |
> - fixed_pmc_events[index].eventsel;
> + eventsel = perf_get_hw_event_config(perf_id);

Yes, the perf_get_hw_event_config() can tell the updated event encoding.

Thanks,
Kan

> + WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
> + return eventsel;
> }
>
> static void intel_pmu_refresh(struct kvm_vcpu *vcpu)