Re: [PATCH v6 04/20] KVM: x86/pmu: Always treat Fixed counters as available when supported

From: Jim Mattson
Date: Sat Nov 04 2023 - 08:43:56 EST


On Fri, Nov 3, 2023 at 5:02 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> Now that KVM hides fixed counters that can't be virtualized, treat fixed
> counters as available when they are supported, i.e. don't silently ignore
> an enabled fixed counter just because guest CPUID says the associated
> general purpose architectural event is unavailable.
>
> KVM originally treated fixed counters as always available, but that got
> changed as part of a fix to avoid confusing REF_CPU_CYCLES, which does NOT
> map to an architectural event, with the actual architectural event used
> associated with bit 7, TOPDOWN_SLOTS.
>
> The commit justified the change with:
>
> If the event is marked as unavailable in the Intel guest CPUID
> 0AH.EBX leaf, we need to avoid any perf_event creation, whether
> it's a gp or fixed counter.
>
> but that justification doesn't mesh with reality. The Intel SDM uses
> "architectural events" to refer to both general purpose events (the ones
> with the reverse polarity mask in CPUID.0xA.EBX) and the events for fixed
> counters, e.g. the SDM makes statements like:
>
> Each of the fixed-function PMC can count only one architectural
> performance event.
>
> but the fact that fixed counter 2 (TSC reference cycles) doesn't have an
> associated general purpose architectural makes trying to apply the mask
> from CPUID.0xA.EBX impossible. Furthermore, the SDM never explicitly
> says that an architectural events that's marked unavailable in EBX affects
> the fixed counters.
>
> Note, at the time of the change, KVM didn't enforce hardware support, i.e.
> didn't prevent userspace from enumerating support in guest CPUID.0xA.EBX
> for architectural events that aren't supported in hardware. I.e. silently
> dropping the fixed counter didn't somehow protection against counting the
> wrong event, it just enforced guest CPUID.
>
> Arguably, userspace is creating a bogus vCPU model by advertising a fixed
> counter but saying the associated general purpose architectural event is
> unavailable. But regardless of the validity of the vCPU model, letting
> the guest enable a fixed counter and then not actually having it count
> anything is completely nonsensical. I.e. even if all of the above is
> wrong and it's illegal for a fixed counter to exist when the architectural
> event is unavailable, silently doing nothing is still the wrong behavior
> and KVM should instead disallow enabling the fixed counter in the first
> place.
>
> Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 8d545f84dc4a..b239e7dbdc9b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -147,11 +147,24 @@ static bool intel_hw_event_available(struct kvm_pmc *pmc)

As discussed (https://lore.kernel.org/kvm/ZUU12-TUR_1cj47u@xxxxxxxxxx/),
this function should go away.

> u8 unit_mask = (pmc->eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
> int i;
>
> + /*
> + * Fixed counters are always available if KVM reaches this point. If a
> + * fixed counter is unsupported in hardware or guest CPUID, KVM doesn't
> + * allow the counter's corresponding MSR to be written. KVM does use
> + * architectural events to program fixed counters, as the interface to
> + * perf doesn't allow requesting a specific fixed counter, e.g. perf
> + * may (sadly) back a guest fixed PMC with a general purposed counter.
> + * But if _hardware_ doesn't support the associated event, KVM simply
> + * doesn't enumerate support for the fixed counter.
> + */
> + if (pmc_is_fixed(pmc))
> + return true;
> +
> BUILD_BUG_ON(ARRAY_SIZE(intel_arch_events) != NR_INTEL_ARCH_EVENTS);
>
> /*
> * Disallow events reported as unavailable in guest CPUID. Note, this
> - * doesn't apply to pseudo-architectural events.
> + * doesn't apply to pseudo-architectural events (see above).
> */
> for (i = 0; i < NR_REAL_INTEL_ARCH_EVENTS; i++) {
> if (intel_arch_events[i].eventsel != event_select ||
> --
> 2.42.0.869.gea05f2083d-goog
>