Re: perf test hybrid failing on 14700K

From: Ian Rogers
Date: Fri Jan 05 2024 - 12:21:45 EST


On Fri, Jan 5, 2024 at 4:05 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Tue, Jan 02, 2024 at 02:41:07PM -0800, Ian Rogers wrote:
> > On Tue, Jan 2, 2024 at 7:43 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> > > On 2023-12-23 8:55 a.m., Arnaldo Carvalho de Melo wrote:
> > > > Hi Kan,
> > > >
> > > > I noticed the following problem, are you able to reproduce it?
> > > >
> > > > Happy solstice!
> > >
> > > Happy new year!
>
> Godt nytår!
>
> > > > - Arnaldo
> > > >
> > > > root@number:/home/acme/Downloads# grep "model name" -m1 /proc/cpuinfo
> > > > model name : Intel(R) Core(TM) i7-14700K
> > > > root@number:/home/acme/Downloads# uname -a
> > > > Linux number 6.6.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec 3 18:13:11 UTC 2023 x86_64 GNU/Linux
> > > > root@number:/home/acme/Downloads# perf -vv
> > > > perf version 6.7.rc6.g63daba4e2861
> > > > dwarf: [ on ] # HAVE_DWARF_SUPPORT
> > > > dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT
> > > > syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT
> > > > libbfd: [ OFF ] # HAVE_LIBBFD_SUPPORT
> > > > debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT
> > > > libelf: [ on ] # HAVE_LIBELF_SUPPORT
> > > > libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT
> > > > numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT
> > > > libperl: [ on ] # HAVE_LIBPERL_SUPPORT
> > > > libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT
> > > > libslang: [ on ] # HAVE_SLANG_SUPPORT
> > > > libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT
> > > > libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT
> > > > libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT
> > > > zlib: [ on ] # HAVE_ZLIB_SUPPORT
> > > > lzma: [ on ] # HAVE_LZMA_SUPPORT
> > > > get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT
> > > > bpf: [ on ] # HAVE_LIBBPF_SUPPORT
> > > > aio: [ on ] # HAVE_AIO_SUPPORT
> > > > zstd: [ on ] # HAVE_ZSTD_SUPPORT
> > > > libpfm4: [ on ] # HAVE_LIBPFM
> > > > libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
> > > > bpf_skeletons: [ on ] # HAVE_BPF_SKEL
> > > > root@number:/home/acme/Downloads# perf test 75
> > > > 75: x86 hybrid : FAILED!
> > >
> > > The failure should be a regression caused by the a24d9d9dc096 ("perf
> > > parse-events: Make legacy events lower priority than sysfs/JSON")
> > >
> > > @@ -1004,10 +1012,19 @@ static int config_term_pmu(struct
> > > perf_event_attr *attr,
> > > err_str, /*help=*/NULL);
> > > return -EINVAL;
> > > }
> > > - attr->type = PERF_TYPE_HARDWARE;
> > > - attr->config = term->val.num;
> > > - if (perf_pmus__supports_extended_type())
> > > - attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> > > + /*
> > > + * If the PMU has a sysfs or json event prefer it over
> > > + * legacy. ARM requires this.
> > > + */
> > > + if (perf_pmu__have_event(pmu, term->config))
> > > For Intel hybrid, the perf_pmu__have_event() should be always true for
> > > all hw events and hw cache events. So the patch will mistakenly update
> > > the type to TYPE_USER. However, On Intel platforms, the type of the hw
> > > events should be TYPE_HARDWARE.
> > >
> > > Seems ARM needs to find another way to distinguish the case.
> > >
> > > Ian, any suggestions?
> >
> > Hi Kan/Mark,
> >
> > Firstly, the perf_pmu__have_event is a test of whether
> > /sys/devices/<pmu>/events or the equivalent json events have the
> > specified event string like "inst_retired.any" - ie it isn't a test of
> > whether the event is supported by the kernel. Mark was quite insistent
> > that the behavior be changed so that if a legacy event is specified
> > with a PMU, the PMU's sysfs/json event is the priority which is a big
> > behavior change on x86.
>
> For the record, I was insistent that the behaviour was *restored*; that was the
> existing behaivour prior to commit:
>
> 5ea8f2ccffb23983 ("perf parse-events: Support hardware events as terms")
>
> ... which was itself a big behaviour change for all architectures, in part led
> to the issue that Hector and Martin were hitting on arm/arm64, and provided no
> recourse to get the prior behaviour when desired.

Fwiw, I don't agree with the description here. The original issue was
that on ARM perf's behavior had become reliant on the core PMUs being
treated as non-core PMUs in that non-core PMUs don't support things
like legacy events. This is/was unsustainable and once fixed exposed a
second bug where ARM's PMU driver didn't support legacy events
correctly, breaking Hector and Martin. I'm not aware of this issue
being fixed in the driver although the failure has been made clear by
this episode. Fixing the PMU driver wouldn't have fixed older kernels
so there is some sense in fixing the tool too.

In the perf tool we've worked around all the issues brought to us by
the ARM PMU by inverting the priority of legacy and sysfs events. Once
again an issue here and Kan was suggesting inverting the sense for
Intel. I'm working hard to make all PMUs on all architectures work the
same and the priority of legacy and sysfs/JSON events during parsing
is some what arbitrary.

It was argued about PMU namespaces in event parsing, no such concept
exists in perf's event parsing code so for the record I've never
bought this. It seemed an excuse to try push ARM's PMU driver failures
onto the perf tool. I can elaborate but don't see the point in
furthering the argument.

It was requested that ARM provide patches to improve testing coverage
for their PMU in perf test, specifically
tools/perf/tests/parse-events.c where relevant tests are skipped as
the PMU name is hard coded to "cpu" which is conventional for a core
PMU, but not on ARM. Were this testing covering big.LITTLE/hybrid
these issues should have come to light earlier.

While much has been done in the perf tool to work around latent issues
with ARM's PMU drivers and the behavior changes they have instigated,
except for 5c816728651a ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE
capability") where I wrote the original version, there doesn't seem to
have been progress made on the ARM PMU driver nor on ARM testing -
which was why such an issue could exist for so long, across numerous
Linux releases and ultimately break Hector and Martin.

Fwiw, I have tickets in the UK next week for FurbyFest:
https://www.linkedin.com/posts/csmcr_furbyfest-a-festschrift-for-professor-steve-activity-7129811310799642624-YOOe/
if it is useful for me to talk to ARM people face-to-face or with a
convenient timezone.

Thanks,
Ian

> I appreciate that the change I requested was a big behaviour change on x86
> relative to v6.5, but the change above in v6.5 was a big behaviour change on
> arm/arm64 relative to the behaviour established many years prior. I'm sorry
> that I did not catch this earlier at the review stage.
>
> > To get the test passing again I've sent out a test update:
> > https://lore.kernel.org/lkml/20240102215732.1125997-1-irogers@xxxxxxxxxx/
>
> FWIW that looks good to me; I've given my Ack there.
>
> Thanks,
> Mark.
>
> > This switches the legacy events in the test to ones which don't have
> > sysfs or json equivalents. The test is somewhat brittle as an x86 PMU
> > change could decide to add /sys/devices/cpu/events/cycles alongside
> > /sys/devices/cpu/events/cpu-cycles. Ideally we'd be testing all events
> > on core PMUs, for legacy events they may have a sysfs/json override
> > and the test expectation should expect this and assert that the type
> > isn't PERF_TYPE_HARDWARE, etc. For now what I sent out is sufficient
> > to get the "x86 hybrid" test passing. I should probably have done the
> > whole Reported-by.. thing, sorry for missing that.
> >
> > Thanks,
> > Ian
> >
> > >
> > > + term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
> > > + term->no_value = true;
> > > + } else {
> > > + attr->type = PERF_TYPE_HARDWARE;
> > > + attr->config = term->val.num;
> > > + if (perf_pmus__supports_extended_type())
> > > + attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> > > + }
> > > return 0;
> > > }
> > > if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER ||
> > >
> > >
> > > Thanks,
> > > Kan
> > > > root@number:/home/acme/Downloads# perf test -v 75
> > > > 75: x86 hybrid :
> > > > --- start ---
> > > > test child forked, pid 4111587
> > > > Using CPUID GenuineIntel-6-B7-1
> > > > running test 0 'cpu_core/cpu-cycles/'
> > > > FAILED arch/x86/tests/hybrid.c:30 wrong type
> > > > Event test failure: test 0 'cpu_core/cpu-cycles/'running test 1 '{cpu_core/cpu-cycles/,cpu_core/instructions/}'
> > > > FAILED arch/x86/tests/hybrid.c:42 wrong type
> > > > Event test failure: test 1 '{cpu_core/cpu-cycles/,cpu_core/instructions/}'running test 2 '{cpu-clock,cpu_core/cpu-cycles/}'
> > > > FAILED arch/x86/tests/hybrid.c:65 wrong type
> > > > Event test failure: test 2 '{cpu-clock,cpu_core/cpu-cycles/}'running test 3 '{cpu_core/cpu-cycles/,cpu-clock}'
> > > > FAILED arch/x86/tests/hybrid.c:78 wrong type
> > > > Event test failure: test 3 '{cpu_core/cpu-cycles/,cpu-clock}'running test 4 '{cpu_core/cpu-cycles/k,cpu_core/instructions/u}'
> > > > FAILED arch/x86/tests/hybrid.c:95 wrong type
> > > > Event test failure: test 4 '{cpu_core/cpu-cycles/k,cpu_core/instructions/u}'running test 5 'r1a'
> > > > running test 6 'cpu_core/r1a/'
> > > > running test 7 'cpu_core/config=10,config1,config2=3,period=1000/u'
> > > > WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
> > > > running test 8 'cpu_core/LLC-loads/'
> > > > test child finished with -1
> > > > ---- end ----
> > > > x86 hybrid: FAILED!
> > > > root@number:/home/acme/Downloads#
> > > >
> > > > root@number:/home/acme/Downloads# perf trace -e perf_event_open perf test -F -v 75
> > > > 75: x86 hybrid :
> > > > --- start ---
> > > > Using CPUID GenuineIntel-6-B7-1
> > > > running test 0 'cpu_core/cpu-cycles/'
> > > > FAILED arch/x86/tests/hybrid.c:30 wrong type
> > > > Event test failure: test 0 'cpu_core/cpu-cycles/'running test 1 '{cpu_core/cpu-cycles/,cpu_core/instructions/}'
> > > > FAILED arch/x86/tests/hybrid.c:42 wrong type
> > > > Event test failure: test 1 '{cpu_core/cpu-cycles/,cpu_core/instructions/}'running test 2 '{cpu-clock,cpu_core/cpu-cycles/}'
> > > > FAILED arch/x86/tests/hybrid.c:65 wrong type
> > > > Event test failure: test 2 '{cpu-clock,cpu_core/cpu-cycles/}'running test 3 '{cpu_core/cpu-cycles/,cpu-clock}'
> > > > FAILED arch/x86/tests/hybrid.c:78 wrong type
> > > > Event test failure: test 3 '{cpu_core/cpu-cycles/,cpu-clock}'running test 4 '{cpu_core/cpu-cycles/k,cpu_core/instructions/u}'
> > > > FAILED arch/x86/tests/hybrid.c:95 wrong type
> > > > Event test failure: test 4 '{cpu_core/cpu-cycles/k,cpu_core/instructions/u}'running test 5 'r1a'
> > > > running test 6 'cpu_core/r1a/'
> > > > running test 7 'cpu_core/config=10,config1,config2=3,period=1000/u'
> > > > WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
> > > > running test 8 'cpu_core/LLC-loads/'
> > > > ---- end ----
> > > > x86 hybrid: FAILED!
> > > > 0.000 ( 0.008 ms): :4115165/4115165 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3ecaddffffffff, sample_stack_user: 4115165, clockid: 925535355, sample_regs_intr: 0x8140c90000a8f7, sample_max_stack: 8, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
> > > > 0.010 ( 0.002 ms): :4115165/4115165 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3ecaddffffffff, sample_stack_user: 4115165, clockid: 925538919, sample_regs_intr: 0x8140c90000a8f7, sample_max_stack: 8, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
> > > > root@number:/home/acme/Downloads# strace -e perf_event_open perf test -F -v 75
> > > > 75: x86 hybrid :
> > > > --- start ---
> > > > Using CPUID GenuineIntel-6-B7-1
> > > > running test 0 'cpu_core/cpu-cycles/'
> > > > FAILED arch/x86/tests/hybrid.c:30 wrong type
> > > > Event test failure: test 0 'cpu_core/cpu-cycles/'running test 1 '{cpu_core/cpu-cycles/,cpu_core/instructions/}'
> > > > FAILED arch/x86/tests/hybrid.c:42 wrong type
> > > > Event test failure: test 1 '{cpu_core/cpu-cycles/,cpu_core/instructions/}'running test 2 '{cpu-clock,cpu_core/cpu-cycles/}'
> > > > FAILED arch/x86/tests/hybrid.c:65 wrong type
> > > > Event test failure: test 2 '{cpu-clock,cpu_core/cpu-cycles/}'running test 3 '{cpu_core/cpu-cycles/,cpu-clock}'
> > > > FAILED arch/x86/tests/hybrid.c:78 wrong type
> > > > Event test failure: test 3 '{cpu_core/cpu-cycles/,cpu-clock}'running test 4 '{cpu_core/cpu-cycles/k,cpu_core/instructions/u}'
> > > > FAILED arch/x86/tests/hybrid.c:95 wrong type
> > > > Event test failure: test 4 '{cpu_core/cpu-cycles/k,cpu_core/instructions/u}'running test 5 'r1a'
> > > > running test 6 'cpu_core/r1a/'
> > > > running test 7 'cpu_core/config=10,config1,config2=3,period=1000/u'
> > > > WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
> > > > running test 8 'cpu_core/LLC-loads/'
> > > > perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0xa<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> > > > perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=0x4<<32|PERF_COUNT_HW_CPU_CYCLES, sample_period=0, sample_type=0, read_format=0, disabled=1, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, PERF_FLAG_FD_CLOEXEC) = 4
> > > > ---- end ----
> > > > x86 hybrid: FAILED!
> > > > +++ exited with 0 +++
> > > > root@number:/home/acme/Downloads#
> > > >