Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5

From: Mark Rutland
Date: Tue Nov 21 2023 - 10:40:12 EST


On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 13:40:31 +0000,
> Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > [Adding key people on Cc]
> >
> > On Tue, 21 Nov 2023 12:08:48 +0000,
> > Hector Martin <marcan@xxxxxxxxx> wrote:
> > >
> > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >
> > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > the PMU, but nothing works anymore.
> >
> > The saving grace in my case is that Debian still ships a 6.1 perftool
> > package, but that's obviously not going to last.
> >
> > I'm happy to test potential fixes.
>
> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> -vvv. And it is quite entertaining (this is taskset to an 'icestorm'
> CPU):

IIUC the tool is doing the wrong thing here and overriding explicit
${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
that ${pmu}'s type and event namespace.

Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
targetted to a specific PMU, it's semantically wrong to rewrite events like
this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
PERF_COUNT_HW_${EVENT}.

Mark.

> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> config 0xb00000000
> disabled 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> enable_on_exec 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843 cpu -1 group_fd -1 flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> enable_on_exec 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843 cpu -1 group_fd -1 flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> enable_on_exec 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843 cpu -1 group_fd -1 flags 0x8 = 5
> arch builtin-diff.o builtin-mem.o common-cmds.h perf-completion.sh
> bench builtin-evlist.c builtin-probe.c CREDITS perf.h
> Build builtin-evlist.o builtin-probe.o design.txt perf-in.o
> builtin-annotate.c builtin-ftrace.c builtin-record.c dlfilters perf-iostat
> builtin-annotate.o builtin-ftrace.o builtin-record.o Documentation perf-iostat.sh
> builtin-bench.c builtin.h builtin-report.c FEATURE-DUMP perf.o
> builtin-bench.o builtin-help.c builtin-report.o include perf-read-vdso.c
> builtin-buildid-cache.c builtin-help.o builtin-sched.c jvmti perf-sys.h
> builtin-buildid-cache.o builtin-inject.c builtin-script.c libapi PERF-VERSION-FILE
> builtin-buildid-list.c builtin-inject.o builtin-script.o libperf perf-with-kcore
> builtin-buildid-list.o builtin-kallsyms.c builtin-stat.c libsubcmd pmu-events
> builtin-c2c.c builtin-kallsyms.o builtin-stat.o libsymbol python
> builtin-c2c.o builtin-kmem.c builtin-timechart.c Makefile python_ext_build
> builtin-config.c builtin-kvm.c builtin-top.c Makefile.config scripts
> builtin-config.o builtin-kvm.o builtin-top.o Makefile.perf tests
> builtin-daemon.c builtin-kwork.c builtin-trace.c MANIFEST trace
> builtin-daemon.o builtin-list.c builtin-version.c perf ui
> builtin-data.c builtin-list.o builtin-version.o perf-archive util
> builtin-data.o builtin-lock.c check-headers.sh perf-archive.sh
> builtin-diff.c builtin-mem.c command-list.txt perf.c
> apple_icestorm_pmu/cycles/: -1: 0 873709 0
> apple_firestorm_pmu/cycles/: -1: 0 873709 0
> cycles: -1: 0 873709 0
> apple_icestorm_pmu/cycles/: 0 873709 0
> apple_firestorm_pmu/cycles/: 0 873709 0
> cycles: 0 873709 0
>
> Performance counter stats for 'ls':
>
> <not counted> apple_icestorm_pmu/cycles/ (0.00%)
> <not counted> apple_firestorm_pmu/cycles/ (0.00%)
> <not counted> cycles (0.00%)
>
> 0.000002250 seconds time elapsed
>
> 0.000000000 seconds user
> 0.000000000 seconds sys
> </quote>
>
> If I run the same thing on another CPU cluster (firestorm), I get
> this:
>
> <quote>
> maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
> apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> config 0xb00000000
> disabled 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> enable_on_exec 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925 cpu -1 group_fd -1 flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> enable_on_exec 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925 cpu -1 group_fd -1 flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> enable_on_exec 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925 cpu -1 group_fd -1 flags 0x8 = 5
> arch builtin-diff.o builtin-mem.o common-cmds.h perf-completion.sh
> bench builtin-evlist.c builtin-probe.c CREDITS perf.h
> Build builtin-evlist.o builtin-probe.o design.txt perf-in.o
> builtin-annotate.c builtin-ftrace.c builtin-record.c dlfilters perf-iostat
> builtin-annotate.o builtin-ftrace.o builtin-record.o Documentation perf-iostat.sh
> builtin-bench.c builtin.h builtin-report.c FEATURE-DUMP perf.o
> builtin-bench.o builtin-help.c builtin-report.o include perf-read-vdso.c
> builtin-buildid-cache.c builtin-help.o builtin-sched.c jvmti perf-sys.h
> builtin-buildid-cache.o builtin-inject.c builtin-script.c libapi PERF-VERSION-FILE
> builtin-buildid-list.c builtin-inject.o builtin-script.o libperf perf-with-kcore
> builtin-buildid-list.o builtin-kallsyms.c builtin-stat.c libsubcmd pmu-events
> builtin-c2c.c builtin-kallsyms.o builtin-stat.o libsymbol python
> builtin-c2c.o builtin-kmem.c builtin-timechart.c Makefile python_ext_build
> builtin-config.c builtin-kvm.c builtin-top.c Makefile.config scripts
> builtin-config.o builtin-kvm.o builtin-top.o Makefile.perf tests
> builtin-daemon.c builtin-kwork.c builtin-trace.c MANIFEST trace
> builtin-daemon.o builtin-list.c builtin-version.c perf ui
> builtin-data.c builtin-list.o builtin-version.o perf-archive util
> builtin-data.o builtin-lock.c check-headers.sh perf-archive.sh
> builtin-diff.c builtin-mem.c command-list.txt perf.c
> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> cycles: -1: 1034653 469125 469125
> apple_icestorm_pmu/cycles/: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: 1035035 469125 469125
> cycles: 1034653 469125 469125
>
> Performance counter stats for 'ls':
>
> 1,035,101 apple_icestorm_pmu/cycles/
> 1,035,035 apple_firestorm_pmu/cycles/
> 1,034,653 cycles
>
> 0.000001333 seconds time elapsed
>
> 0.000000000 seconds user
> 0.000000000 seconds sys
> </quote>
>
> which doesn't make any sense either. I really don't understand what
> this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> nor what this 'cycle=0' stuff is.
>
> /puzzled
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.