Re: [V14 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

From: James Clark
Date: Tue Nov 14 2023 - 12:10:57 EST




On 14/11/2023 05:13, Anshuman Khandual wrote:
[...]
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index d712a19e47ac..76f1376ae594 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -317,6 +317,15 @@ armpmu_del(struct perf_event *event, int flags)
> struct hw_perf_event *hwc = &event->hw;
> int idx = hwc->idx;
>
> + if (has_branch_stack(event)) {
> + WARN_ON_ONCE(!hw_events->brbe_users);
> + hw_events->brbe_users--;
> + if (!hw_events->brbe_users) {
> + hw_events->brbe_context = NULL;
> + hw_events->brbe_sample_type = 0;
> + }
> + }
> +
> armpmu_stop(event, PERF_EF_UPDATE);
> hw_events->events[idx] = NULL;
> armpmu->clear_event_idx(hw_events, event);
> @@ -333,6 +342,22 @@ armpmu_add(struct perf_event *event, int flags)
> struct hw_perf_event *hwc = &event->hw;
> int idx;
>
> + if (has_branch_stack(event)) {
> + /*
> + * Reset branch records buffer if a new task event gets
> + * scheduled on a PMU which might have existing records.
> + * Otherwise older branch records present in the buffer
> + * might leak into the new task event.
> + */
> + if (event->ctx->task && hw_events->brbe_context != event->ctx) {
> + hw_events->brbe_context = event->ctx;
> + if (armpmu->branch_reset)
> + armpmu->branch_reset();

What about a per-thread event following a per-cpu event? Doesn't that
also need to branch_reset()? If hw_events->brbe_context was already
previously assigned, once the per-thread event is switched in it skips
this reset following a per-cpu event on the same core.

I think it should be possible to add a test for this scenario by
creating simulaneous per-cpu and per-thread events and checking for leakage.

> + }
> + hw_events->brbe_users++;
> + hw_events->brbe_sample_type = event->attr.branch_sample_type;
> + }
> +
> /* An event following a process won't be stopped earlier */
> if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
> return -ENOENT;
> @@ -512,13 +537,24 @@ static int armpmu_event_init(struct perf_event *event)
> !cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
> return -ENOENT;
>
> - /* does not support taken branch sampling */
> - if (has_branch_stack(event))
> + /*
> + * Branch stack sampling events are allowed
> + * only on PMU which has required support.
> + */
> + if (has_branch_stack(event) && !armpmu->has_branch_stack)
> return -EOPNOTSUPP;
>
> return __hw_perf_event_init(event);
> }
>

[...]
> +/*
> + * This is a read only constant and safe during multi threaded access
> + */
> +static struct perf_branch_stack zero_branch_stack = { .nr = 0, .hw_idx = -1ULL};
> +
> +static void read_branch_records(struct pmu_hw_events *cpuc,
> + struct perf_event *event,
> + struct perf_sample_data *data,
> + bool *branch_captured)
> +{
> + /*
> + * CPU specific branch records buffer must have been allocated already
> + * for the hardware records to be captured and processed further.
> + */
> + if (WARN_ON(!cpuc->branches))
> + return;
> +
> + /*
> + * Overflowed event's branch_sample_type does not match the configured
> + * branch filters in the BRBE HW. So the captured branch records here
> + * cannot be co-related to the overflowed event. Report to the user as
> + * if no branch records have been captured, and flush branch records.
> + * The same scenario is applicable when the current task context does
> + * not match with overflown event.
> + */
> + if ((cpuc->brbe_sample_type != event->attr.branch_sample_type) ||
> + (event->ctx->task && cpuc->brbe_context != event->ctx)) {
> + perf_sample_save_brstack(data, event, &zero_branch_stack);
> + return;
> + }

I think we should probably add a test for this scenario too. Like that
the second event opened on the same thread as another event with
different brbe settings always produces zero records.

I actually tried to reproduce this behaviour but couldn't. Not sure if I
did something wrong though.