Re: [PATCH V11 00/10] arm64/perf: Enable branch stack sampling

From: Anshuman Khandual
Date: Fri Jun 09 2023 - 07:13:39 EST




On 5/31/23 09:34, Anshuman Khandual wrote:
> This series enables perf branch stack sampling support on arm64 platform
> via a new arch feature called Branch Record Buffer Extension (BRBE). All
> relevant register definitions could be accessed here.
>
> https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
>
> This series applies on 6.4-rc4.
>
> Changes in V11:
>
> - Fixed the crash for per-cpu events without event->pmu_ctx->task_ctx_data
>
> Changes in V10:
>
> https://lore.kernel.org/all/20230517022410.722287-1-anshuman.khandual@xxxxxxx/
>
> - Rebased the series on v6.4-rc2
> - Moved ARMV8 PMUV3 changes inside drivers/perf/arm_pmuv3.c
> - Moved BRBE driver changes inside drivers/perf/arm_brbe.[c|h]
> - Moved the WARN_ON() inside the if condition in armv8pmu_handle_irq()
>
> Changes in V9:
>
> https://lore.kernel.org/all/20230315051444.1683170-1-anshuman.khandual@xxxxxxx/
>
> - Fixed build problem with has_branch_stack() in arm64 header
> - BRBINF_EL1 definition has been changed from 'Sysreg' to 'SysregFields'
> - Renamed all BRBINF_EL1 call sites as BRBINFx_EL1
> - Dropped static const char branch_filter_error_msg[]
> - Implemented a positive list check for BRBE supported perf branch filters
> - Added a comment in armv8pmu_handle_irq()
> - Implemented per-cpu allocation for struct branch_record records
> - Skipped looping through bank 1 if an invalid record is detected in bank 0
> - Added comment in armv8pmu_branch_read() explaining prohibited region etc
> - Added comment warning about erroneously marking transactions as aborted
> - Replaced the first argument (perf_branch_entry) in capture_brbe_flags()
> - Dropped the last argument (idx) in capture_brbe_flags()
> - Dropped the brbcr argument from capture_brbe_flags()
> - Used perf_sample_save_brstack() to capture branch records for perf_sample_data
> - Added comment explaining rationale for setting BRBCR_EL1_FZP for user only traces
> - Dropped BRBE prohibited state mechanism while in armv8pmu_branch_read()
> - Implemented event task context based branch records save mechanism
>
> Changes in V8:
>
> https://lore.kernel.org/all/20230123125956.1350336-1-anshuman.khandual@xxxxxxx/
>
> - Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper
> - Added a comment and line break before arm_pmu->private element
> - Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]()
> - Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event()
> - Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP()
> - Changed brbe_hw_attr->brbe_version from 'bool' to 'int'
> - Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]()
> - Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid()
> - Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings
> - Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags()
> - Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type()
> - Renamed brbe_valid() as brbe_record_is_complete()
> - Renamed brbe_source() as brbe_record_is_source_only()
> - Renamed brbe_target() as brbe_record_is_target_only()
> - Inverted checks for !brbe_record_is_[target|source]_only() for info capture
> - Replaced 'fetch' with 'get' in all helpers that extract field value
> - Dropped 'static int brbe_current_bank' optimization in select_brbe_bank()
> - Dropped select_brbe_bank_index() completely, added capture_branch_entry()
> - Process captured branch entries in two separate loops one for each BRBE bank
> - Moved branch_records_alloc() inside armv8pmu_probe_pmu()
> - Added a forward declaration for the helper has_branch_stack()
> - Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free()
> - Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call
>
> Changes in V7:
>
> https://lore.kernel.org/all/20230105031039.207972-1-anshuman.khandual@xxxxxxx/
>
> - Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event
> - Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header
> - Defined BRBFCR_EL1_DEFAULT_CONFIG in the header
> - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP
> - Defined BRBCR_EL1_DEFAULT_TS in the header
> - Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS
> - Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr()
> - Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr()
> - Also set BRBE in paused state in armv8pmu_branch_disable()
> - Dropped brbe_paused(), set_brbe_paused() helpers
> - Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid()
> - Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr
> - Added valid_brbe_[cc, format, version]() helpers
> - Split a separate brbe_attributes_probe() from armv8pmu_branch_probe()
> - Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid()
> - Defined enum brbe_bank_idx with possible values for BRBE bank indices
> - Changed armpmu->hw_attr into armpmu->private
> - Added missing space in stub definition for armv8pmu_branch_valid()
> - Replaced both kmalloc() with kzalloc()
> - Added BRBE_BANK_MAX_ENTRIES
> - Updated comment for capture_brbe_flags()
> - Updated comment for struct brbe_hw_attr
> - Dropped space after type cast in couple of places
> - Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read()
> - Captured cpuc->branches->branch_entries[idx] in a local variable
> - Dropped saved_priv from armv8pmu_branch_read()
> - Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration
> - Replaced with FIELD_GET() and FIELD_PREP() wherever applicable
> - Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL
> - Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version()
> select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation
> - Reorganized brbe_valid_nr() and dropped the pr_warn() message
> - Changed probe sequence in brbe_attributes_probe()
> - Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state
> - Disable BRBE before disabling the PMU event counter
> - Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode()
> - Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported()
>
> Changes in V6:
>
> https://lore.kernel.org/linux-arm-kernel/20221208084402.863310-1-anshuman.khandual@xxxxxxx/
>
> - Restore the exception level privilege after reading the branch records
> - Unpause the buffer after reading the branch records
> - Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level
> - Reworked BRBE implementation and branch stack sampling support on arm pmu
> - BRBE implementation is now part of overall ARMV8 PMU implementation
> - BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/
> - CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig
> - File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c
> - File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h
> - BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events
> - BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events
> - BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation
> - Added sched_task() callback into struct arm_pmu
> - Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes
> - Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu
> - Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events
> - Dropped brbe_users, brbe_context tracking in struct hw_pmu_events
> - Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag
> - armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation
> - Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe
> - Added armv8pmu_branch_reset() inside armv8pmu_branch_enable()
> - Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK
> - Dropped set_brbe_disabled() as well
> - Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events
>
> Changes in V5:
>
> https://lore.kernel.org/linux-arm-kernel/20221107062514.2851047-1-anshuman.khandual@xxxxxxx/
>
> - Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01
> - Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI
> - Changed config ARM_BRBE_PMU from 'tristate' to 'bool'
>
> Changes in V4:
>
> https://lore.kernel.org/all/20221017055713.451092-1-anshuman.khandual@xxxxxxx/
>
> - Changed ../tools/sysreg declarations as suggested
> - Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags
> - Dropped perfmon_capable() check in armpmu_event_init()
> - s/pr_warn_once/pr_info in armpmu_event_init()
> - Added brbe_format element into struct pmu_hw_events
> - Changed v1p1 as brbe_v1p1 in struct pmu_hw_events
> - Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning
>
> Changes in V3:
>
> https://lore.kernel.org/all/20220929075857.158358-1-anshuman.khandual@xxxxxxx/
>
> - Moved brbe_stack from the stack and now dynamically allocated
> - Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv()
> - Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg
> - Created dummy BRBINF_EL1 field definitions in tools/sysreg
> - Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable()
> - Both exception and exception return branche records are now captured
> only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already
> been checked in generic perf via perf_allow_kernel()
>
> Changes in V2:
>
> https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@xxxxxxx/
>
> - Dropped branch sample filter helpers consolidation patch from this series
> - Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
> - Use cached perfmon_capable() while configuring BRBE branch record filters
>
> Changes in V1:
>
> https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@xxxxxxx/
>
> - Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
> - Process new perf branch types via PERF_BR_EXTEND_ABI
>
> Changes in RFC V2:
>
> https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@xxxxxxx/
>
> - Added branch_sample_priv() while consolidating other branch sample filter helpers
> - Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
> - Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
> - Added documentation for struct arm_pmu changes, updated commit message
> - Updated commit message for BRBE detection infrastructure patch
> - PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
> - Branch privilege state capture mechanism has now moved inside the driver
>
> Changes in RFC V1:
>
> https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@xxxxxxx/
>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Mark Brown <broonie@xxxxxxxxxx>
> Cc: James Clark <james.clark@xxxxxxx>
> Cc: Rob Herring <robh@xxxxxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Suzuki Poulose <suzuki.poulose@xxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: linux-perf-users@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
>
> Anshuman Khandual (10):
> drivers: perf: arm_pmu: Add new sched_task() callback
> arm64/perf: Add BRBE registers and fields
> arm64/perf: Add branch stack support in struct arm_pmu
> arm64/perf: Add branch stack support in struct pmu_hw_events
> arm64/perf: Add branch stack support in ARMV8 PMU
> arm64/perf: Enable branch stack events via FEAT_BRBE
> arm64/perf: Add PERF_ATTACH_TASK_DATA to events with has_branch_stack()
> arm64/perf: Add struct brbe_regset helper functions
> arm64/perf: Implement branch records save on task sched out
> arm64/perf: Implement branch records save on PMU IRQ

Hello Mark,

I am working on your review comments for the first six patches in this
series, and planning to respin next week. But it would be great if you
could also review rest of the series [PATCH 7 - 10] which adds branch
records save-restore mechanism and let me know your thoughts. It will
help in collecting more changes (if required) for the next spin. Thank
you.

- Anshuman