Re: [PATCH] perf print-events: make is_event_supported() more robust

From: Namhyung Kim
Date: Fri Jan 19 2024 - 00:57:54 EST


Hello,

Adding Ravi to CC.

On Wed, Jan 17, 2024 at 4:12 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Wed, Jan 17, 2024 at 09:05:25AM +0000, Marc Zyngier wrote:
> > Hi Mark,
> >
> > On Tue, 16 Jan 2024 17:03:48 +0000,
> > Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > >
> > > Currently the perf tool doesn't deteect support for extneded event types
> > > on Apple M1/M2 systems, and will not auto-expand plain PERF_EVENT_TYPE
> > > hardware events into per-PMU events. This is due to the detection of
> > > extended event types not handling mandatory filters required by the
> > > M1/M2 PMU driver.
> >
> > Thanks for looking into this.
> >
> > I've given your patch a go on my M1 box, and it indeed makes things
> > substantially better:
> >
> > $ sudo ./perf stat -e cycles ~/hackbench 100 process 1000
> > Running with 100*40 (== 4000) tasks.
> > Time: 3.419
> >
> > Performance counter stats for '/home/maz/hackbench 100 process 1000':
> >
> > 174,783,472,090 apple_firestorm_pmu/cycles/ (93.10%)
> > 39,134,744,813 apple_icestorm_pmu/cycles/ (71.86%)
> >
> > 3.568145595 seconds time elapsed
> >
> > 12.203084000 seconds user
> > 55.135271000 seconds sys
>
> Thanks for giving that a spin!
>
> > However, I'm seeing some slightly odd behaviours:
> >
> > $ sudo ./perf stat -e cycles:k ~/hackbench 100 process 1000
> > Running with 100*40 (== 4000) tasks.
> > Time: 3.313
> >
> > Performance counter stats for '/home/maz/hackbench 100 process 1000':
> >
> > <not supported> apple_firestorm_pmu/cycles:k/
> > <not supported> apple_icestorm_pmu/cycles:k/

Hmm.. I guess this should look like apple_firestorm_pmu/cycles/k.
IIRC there was a thread for this, right?

> >
> > 3.467568841 seconds time elapsed
> >
> > 13.080111000 seconds user
> > 53.162099000 seconds sys
> >
> > I would have expected it to count, but it didn't. For that to work, I
> > have to add the 'H' modifier:
>
> Ok, so that'll have something to do with the way the tool chooses which
> perf_evant_attr::exclude_* bits to set. I thought that was the same for plain
> events and pmu_name/event/ events, but I could be mistaken.

I think it sets the attr.exclude_guest by event_attr_init(). Maybe
it's deleted during the missing feature detection logic. But IIUC
it should work on each PMU separately.

By the way, I really hope the kernel exports caps/exclude_bits
for PMUs so that tools can see which bits are supported. For
example AMD IBS has CAP_NO_EXCLUDE so setting exclude_guest
will fail to open. Then it disables the new features added after
that in the missing feature detection logic.

If we know if it doesn't support any exclude bits, then tools can
try other features after removing the bit first.

Thanks,
Namhyung

>
> Is that something you had tried prior to this patch, and did that "just work"
> with the explicit pmu_name/event/ syntax prior to this patch?
>
> e.g. did something like:
>
> perf stat -e apple_firestorm_pmu/cycles/k -e apple_icestorm_pmu/cycles/k ./workload
>
> ... happen to work withiout requiring the addition of 'H'?
>
> If so, does that behave the same before/after this patch?
>
> ... and could you run that with '-vvv' and dump the output for comparison?
>
> Thanks,
> Mark.