Re: [PATCH v2 2/6] KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()

From: Jim Mattson
Date: Wed Feb 09 2022 - 14:31:02 EST


On Wed, Feb 9, 2022 at 1:00 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
>
> On 5/2/2022 9:55 am, Jim Mattson wrote:
> >> +static unsigned int amd_pmc_perf_hw_id(struct kvm_pmc *pmc)
> >> {
> >> + u8 event_select = pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> > On AMD, the event select is 12 bits.
>
> Out of your carefulness, we already know this fact.
>
> This function to get the perf_hw_id by the last 16 bits still works because we
> currently
> do not have a 12-bits-select event defined in the amd_event_mapping[]. The
> 12-bits-select
> events (if any) will be programed in the type of PERF_TYPE_RAW.

I beg to differ. It doesn't matter whether there are 12-bit event
selects in amd_event_mapping[] or not. The fundamental problem is that
the equality operation on event selects is broken, because it ignores
the high 4 bits. Hence, we may actually find an entry in that table
that we *think* is for the requested event, but instead it's for some
other event with 0 in the high 4 bits. For example, if the guest
requests event 0x1d0 (retired fused instructions), they will get event
0xd0 instead. According to amd_event_mapping, event 0xd0 is "
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND." However, according to the
Milan PPR, event 0xd0 doesn't exist. So, I don't actually know what
we're counting.

At the very least, we need a patch like the following (which I fully
expect gmail to mangle):

--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -210,7 +210,8 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
if (!allow_event)
return;

- if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
+ if (!(eventsel & ((0xFULL << 32) |
+ ARCH_PERFMON_EVENTSEL_EDGE |
ARCH_PERFMON_EVENTSEL_INV |
ARCH_PERFMON_EVENTSEL_CMASK |
HSW_IN_TX |

By the way, the following events from amd_event_mapping[] are not
listed in the Milan PPR:
{ 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES }
{ 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES }
{ 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }
{ 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND }

Perhaps we should build a table based on amd_f17h_perfmon_event_map[]
for newer AMD processors?