Re: [PATCH] perf/arm64: fix mapping for HW_BRANCH_INSTRUCTIONS on PMUv3

From: Mark Rutland
Date: Fri Feb 04 2022 - 06:48:45 EST


Hi Stephane,

On Thu, Feb 03, 2022 at 11:39:40PM -0800, Stephane Eranian wrote:
> With the existing code, the following command:
>
> $ perf stat -e branches sleep 0
> Performance counter stats for 'sleep 0':
> <not supported> branches
>
> on N1 core (pmuv3).

This is definitely not ideal. :(

> This is due to the fact that the mapping for the generic event is wrong.

I don't think that's quite true; more detail below. This is certainly *messy*
though.

> It is using ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED which is not implemented
> on N1 (and most likely on any PMUv3 implementations). However, there is
> another supported event ARMV8_PMUV3_PERFCTR_BR_RETIRED measuring the same
> condition.

I have a couple of concerns here:

1) Both PC_WRITE_RETIRED and BR_RETIRED are OPTIONAL (though the Arm strongly
recommends that BR_RETIRED is implemented), so CPUs may exist which only
support one of the two, or both.

So as-is, this patch may break working support for CPUs which have
PC_WRITE_RETIRED but not BR_RETIRED.

IIUC we should be able to detect whether either are implemented by looking
at PMCEID, and we could take that into account when mapping the event.

2) IIUC (even with ARMv8.6) there is a potential semantic difference between
PC_WRITE_RETIRED and BR_RETIRED, in that e.g. PC_WRITE_RETIRED must include
exception returns while this is IMPLEMENTATION DEFINED for BR_RETIRED.

I guess this might not matter all that much given the precise definition of
"Software change of the PC" is IMPLEMENTATION DEFINED, but I don't think
it's true that the two events count "the same condition", and we should be
more explicit about that.

> This patch switches the mapping to ARMV8_PMUV3_PERFCTR_BR_RETIRED so that
> the perf stat command above works.
>
> Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
> ---
> arch/arm64/kernel/perf_event.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index cab678ed6618..ec2b98343a0b 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -45,7 +45,7 @@ static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
> [PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INST_RETIRED,
> [PERF_COUNT_HW_CACHE_REFERENCES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
> [PERF_COUNT_HW_CACHE_MISSES] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
> - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED,
> + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_BR_RETIRED,

As above, I don't think we can unconditionally make this change, and instead
should have the mapping function take PMCEID into account to map the event (or
bail out if we don't know a suitable event is implemented).

Thanks,
Mark.

> [PERF_COUNT_HW_BRANCH_MISSES] = ARMV8_PMUV3_PERFCTR_BR_MIS_PRED,
> [PERF_COUNT_HW_BUS_CYCLES] = ARMV8_PMUV3_PERFCTR_BUS_CYCLES,
> [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV8_PMUV3_PERFCTR_STALL_FRONTEND,
> --
> 2.35.0.263.gb82422642f-goog
>