Re: [PATCH v4 5/9] KVM: selftests: Test Intel PMU architectural events on fixed counters

From: Sean Christopherson
Date: Thu Oct 19 2023 - 19:55:36 EST


On Mon, Sep 11, 2023, Jinrong Liang wrote:
> From: Jinrong Liang <cloudliang@xxxxxxxxxxx>
>
> Update test to cover Intel PMU architectural events on fixed counters.
> Per Intel SDM, PMU users can also count architecture performance events
> on fixed counters (specifically, FIXED_CTR0 for the retired instructions
> and FIXED_CTR1 for cpu core cycles event). Therefore, if guest's CPUID
> indicates that an architecture event is not available, the corresponding
> fixed counter will also not count that event.
>
> Co-developed-by: Like Xu <likexu@xxxxxxxxxxx>
> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
> Signed-off-by: Jinrong Liang <cloudliang@xxxxxxxxxxx>
> ---
> .../selftests/kvm/x86_64/pmu_counters_test.c | 22 +++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> index f47853f3ab84..fe9f38a3557e 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> @@ -106,6 +106,28 @@ static void guest_measure_loop(uint8_t idx)
> GUEST_ASSERT_EQ(expect, !!_rdpmc(i));
> }
>
> + if (this_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS) < 1)

The ENTIRE point of reworking this_pmu_has() to be able to query fixed counters
was to avoid having to open code checks like this. And this has no basis in
reality, the fixed counters aren't all-or-nothing, e.g. if the number of fixed
counters is '1', then the below will explode because the test will try to write
a non-existent MSR.

> + goto done;
> +
> + if (idx == INTEL_ARCH_INSTRUCTIONS_RETIRED)
> + i = 0;
> + else if (idx == INTEL_ARCH_CPU_CYCLES)
> + i = 1;
> + else if (idx == PSEUDO_ARCH_REFERENCE_CYCLES)
> + i = 2;
> + else
> + goto done;
> +
> + wrmsr(MSR_CORE_PERF_FIXED_CTR0 + i, 0);
> + wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, BIT_ULL(4 * i));
> +
> + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(PMC_IDX_FIXED + i));
> + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
> + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +
> + GUEST_ASSERT_EQ(expect, !!_rdpmc(PMC_FIXED_RDPMC_BASE | i));

Pulling in context from the previous patch, "expect" is set based on the existence
of the architectural event in CPUID.0xA.EBX. That is completely bogus, especially
for TSC reference cycles which has been incorrectly aliased to Topdown Slots.

expect = this_pmu_has_arch_event(KVM_X86_PMU_FEATURE(UNUSED, idx));

Nothing in the SDM says that an event that's marked unavailable in CPUID.0xA.EBX
magically makes the fixed counter not work. It's a rather bogus CPUID, e.g. I
can't imagine real hardware has any such setup, but silently not counting is most
definitely not correct.

Digging around in KVM, I see that KVM _deliberately_ provides this behavior. That
is just bogus. If the guest can enable a fixed counter, then it should count.
*If* the interpretation of the SDM is that the fixed counter isn't available when
the associated architectural event isn't available, then the most sane behavior
is to not allow the fixed counter to be enabled in the first place. Silently
doing nothing is awful. And again the whole Topdown Slots vs. TSC ref cycles
confusion means this is completely broken regardless of how one interprets the
SDM.

And the above on-demand creation of each KVM_X86_PMU_FEATURE() kinda defeats the
purpose of using well-known names. Rather than smush everything into the general
purpose architectural events, which is definitely broken and arguably a straight
violation of the SDM, I think the best option is to loosely couple the GP vs.
fixed events so that we can reason about the high-level event type, e.g. to
determine which events are "stable" enough to assert on.

It's not the prettiest due to not being able to directly compare structs, but it
at least allows checking for architectural events vs. fixed counters independently,
without completely losing the common bits.

#define X86_PMU_FEATURE_NULL \
({ \
struct kvm_x86_pmu_feature feature = {}; \
\
feature; \
})

static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
{
return !(*(u64 *)&event);
}

static void guest_measure_loop(uint8_t idx)
{
const struct {
struct kvm_x86_pmu_feature gp_event;
struct kvm_x86_pmu_feature fixed_event;
} intel_event_to_feature[] = {
[INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
[INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
/*
* Note, the fixed counter for reference cycles is NOT the same
* as the general purpose architectural event (because the GP
* event is garbage). The fixed counter explicitly counts at
* the same frequency as the TSC, whereas the GP event counts
* at a fixed, but uarch specific, frequency. Bundle them here
* for simplicity.
*/
[INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_CYCLES_FIXED },
[INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
[INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
[INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
[INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
};