Re: [PATCH v4 5/6] perf vendor events arm64: Update stall_slot workaround for N2 r0p3

From: James Clark
Date: Wed Aug 09 2023 - 09:06:41 EST




On 08/08/2023 11:18, John Garry wrote:
> On 07/08/2023 15:20, James Clark wrote:
>
> Hi James,
>
> Thanks for the effort in doing this.
>
>> N2 r0p3 doesn't require the workaround [1], so gating on (#slots - 5) no
>> longer works because all N2s have 5 slots. Add a new expression builtin
>> that identifies the need for the workaround correctly.
>>
>> [1]:
>> https://urldefense.com/v3/__https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2-r0p3.json__;!!ACWV5N9M2RV99hQ!Nx1xgWXXS9GBasNpOKQXHWKe8VwpRB0h8lAfOmwUmkkOQTjFqn2NswO8ti8vTcblUfAYN9NAtxqAh-sf0TEOvQ$
>> Signed-off-by: James Clark <james.clark@xxxxxxx>
>> ---
>>   tools/perf/arch/arm64/util/pmu.c              | 21 +++++++++++++++++++
>>   .../arm64/arm/neoverse-n2-v2/metrics.json     |  8 +++----
>>   tools/perf/util/expr.c                        |  4 ++++
>>   tools/perf/util/pmu.c                         |  6 ++++++
>>   tools/perf/util/pmu.h                         |  1 +
>>   5 files changed, 36 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/arch/arm64/util/pmu.c
>> b/tools/perf/arch/arm64/util/pmu.c
>> index 561de0cb6b95..30e2385a83cf 100644
>> --- a/tools/perf/arch/arm64/util/pmu.c
>> +++ b/tools/perf/arch/arm64/util/pmu.c
>> @@ -2,6 +2,7 @@
>>     #include <internal/cpumap.h>
>>   #include "../../../util/cpumap.h"
>> +#include "../../../util/header.h"
>>   #include "../../../util/pmu.h"
>>   #include "../../../util/pmus.h"
>>   #include <api/fs/fs.h>
>> @@ -62,3 +63,23 @@ double perf_pmu__cpu_slots_per_cycle(void)
>>         return slots ? (double)slots : NAN;
>>   }
>> +
>> +double perf_pmu__no_stall_errata(void)
>
> While I like the approach of encoding the per-CPU support in the metric
> expression, I find that literal "no stall errata" is vague and could
> apply to any "stall errata" for any SoC for any architecture.
>
> If we were going to do it this way, then we would need a more specific
> name, like perf_pmu__no_stall_errata_arm64_errata123456(), but that
> should not be in the core perf code.
>
> Could we instead add a function to check cpuid and have something like
> this in the JSON:
>
> "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if
> (cpuid_less_than(410fd493)) else (stall_slot - cpu_cycles)) / (#slots *
> cpu_cycles))"
>
> I'm currently figuring out how cpuid_less_than() would be implemented
> (I'm no great python wrangler), but it would be along the lines of what
> Ian added for "has_event" in
> https://lore.kernel.org/linux-perf-users/20230623151016.4193660-1-irogers@xxxxxxxxxx/
>
> Thanks,
> John

Yeah it looks like it could be done that way. Also, the way I added it,
it doesn't have access to the PMU type, it just does a generic
pmu__find_core_pmu() so won't work very well for heterogeneous systems.

If we're going to do a deeper modification of the expression parser like
with has_event() it might be possible to pass in the actual CPU ID that
the metric is running on which would be better.

I'll have a look.

James