Re: [PATCH V13 - RESEND 00/10] arm64/perf: Enable branch stack sampling

From: Anshuman Khandual
Date: Mon Aug 21 2023 - 04:54:00 EST


Hello Will,

Separated this part out just to understand it better.

On 8/18/23 23:26, Will Deacon wrote:
>> These stubs are necessary to protect PMU drivers built on arm32 which share
>> basic branch record processing abstraction at ARMV8 PMU level. It compiles
>> successfully both on arm32 and arm64 platforms with these changes. Subject
>> line for this patch clearly mentions that as well.

> But they shouldn't be needed. When CONFIG_ARM64_BRBE is not selected, no
> branch record processing functions should be needed, empty or otherwise.

But that is not how the code is organized currently. CONFIG_ARM64_BRBE enables
the driver to bring in BRBE HW specific branch stack callback implementation
details. But these callbacks are always present at ARMV8 PMU level even without
BRBE implementation as well. Hence these call sites are present, regardless of
CONFIG_ARM64_BRBE.

> The callers should not exist in the first place (i.e. the empty definitions

But how to achieve that ? Branch stack needs to be driven along side normal PMU
events, which in turn get driven from 'struct arm_pmu' callbacks. Hence branch
stack callbacks too needs to be at ARMV8 PMU level, and closely tied to normal
PMU event handling callbacks. Let's examine from where all these new callbacks
are called from.

* armv8pmu_disable_event() ---> armv8pmu_branch_disable()
* armv8pmu_handle_irq() ---> armv8pmu_branch_read()
* armv8pmu_sched_task() ---> armv8pmu_branch_save()
* armv8pmu_sched_task() ---> armv8pmu_branch_reset()
* armv8pmu_reset() ---> armv8pmu_branch_reset()
* __armv8_pmuv3_map_event() ---> armv8pmu_branch_attr_valid()
* __armv8pmu_probe_pmu() ---> armv8pmu_branch_probe()
* armv8pmu_probe_pmu() ---> armv8pmu_task_ctx_cache_alloc()
* armv8pmu_probe_pmu() ---> branch_records_alloc()

Please note that, branch_records_alloc() being deferred allocation is only one
that is platform agnostic.

I did not intend to make any of the BRBE details visible at ARMV8 PMU level i.e
right inside armv8pmu_XXXX() definitions, as ARMV8 PMU is shared between arm32
and arm64 platforms.

Hence these new branch stack callbacks are added along with required fallback
stubs for build protection, so that the HW implementations can be hidden inside
a new driver wrapped in CONFIG_ARM64_BRBE. Please note that these new branch
stack functions are not 'struct arm_pmu' callbacks but instead normal helpers.

> should live in the core driver code, not in the arch header, or the calling

Driver code is compiled with CONFIG_ARM64_BRBE, hence the real definitions are
there. Instead default stubs are required only when armv8pmu_branch_XXX() call
backs are not defined. But are you suggesting that these stubs be moved inside
drivers/perf/arm_pmuv3.c (where all call sites reside), so that there will be
just one stub copy both for arm32 and arm64 platforms removing their duplicate
definitions from arch headers i.e arch/arm64/include/asm/perf_event.h and
arch/arm/include/asm/arm_pmuv3.h ?

> code should not be compiled at all).

Branch stack sampling is always enabled from core perf perspective without any
config option to wrap around, hence calling code cannot be selectively enabled
or disabled.

- Anshuman