Re: [PATCH] perf test: Fix session topology test on heterogeneous systems

From: Ian Rogers
Date: Mon Jan 22 2024 - 12:46:11 EST


Hi James, I think the subject should be something like "perf evlist:
Fix new_default for >1 core PMU" as the change will apply more widely
than just the test. The test failure fix can be in the subject. You
could add a:

Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw476VKA@xxxxxxxxxxxxxx/

On Mon, Jan 22, 2024 at 7:55 AM James Clark <james.clark@xxxxxxx> wrote:
>
> The test currently fails with this message when evlist__new_default()
> opens more than one event:
>
> 32: Session topology :
> --- start ---
> templ file: /tmp/perf-test-vv5YzZ
> Using CPUID 0x00000000410fd070
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> config 0xb00000000
> disabled 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> config 0xa00000000
> disabled 1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5
> non matching sample_type
> FAILED tests/topology.c:73 can't get session
> ---- end ----
> Session topology: FAILED!
>
> This is because when re-opening the file and parsing the header, Perf
> expects that any file that has more than one event has the session ID
> flag set. Perf record already sets the flag in a similar way when there
> is more than one event, so add the same logic to evlist__new_default().
>
> evlist__new_default() is only currently used in tests, so I don't
> expect this change to have any other side effects.
>
> The session topology test has been failing on Arm big.LITTLE platforms
> since commit 251aa040244a ("perf parse-events: Wildcard most
> "numeric" events") when evlist__new_default() started opening multiple
> events for 'cycles'.
>
> Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events")
> Signed-off-by: James Clark <james.clark@xxxxxxx>
> ---
> tools/perf/util/evlist.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 95f25e9fb994..56db37fac6f6 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -95,6 +95,7 @@ struct evlist *evlist__new_default(void)
> struct evlist *evlist = evlist__new();
> bool can_profile_kernel;
> int err;
> + struct evsel *evsel;
>
> if (!evlist)
> return NULL;
> @@ -106,6 +107,10 @@ struct evlist *evlist__new_default(void)
> evlist = NULL;
> }
>
> + if (evlist->core.nr_entries > 1)
> + evlist__for_each_entry(evlist, evsel)
> + evsel__set_sample_id(evsel, false);
> +

nit: the if should have curlies, with them we can reduce the scope of
evsel like below. It is also nice for constants to name the arguments
[1].

if (evlist->core.nr_entries > 1) {
struct evsel *evsel;

evlist__for_each_entry(evlist, evsel)
evsel__set_sample_id(evsel, /*can_sample_identifier=*/false);
}

Tested-by: Ian Rogers <irogers@xxxxxxxxxx>
(also Reviewed-by)

When testing with this with Mark's change [2] I see on alderlake two failures:
```
irogers@alderlake:~$ perf test 74 -vv
Couldn't bump rlimit(MEMLOCK), failures may take place when creating
BPF maps, etc
74: daemon operations :
--- start ---
test child forked, pid 553821
test daemon list
test daemon reconfig
test daemon stop
test daemon signal
signal 12 sent to session 'test [554082]'
signal 12 sent to session 'test [554082]'
FAILED: perf data no generated
test daemon ping
test daemon lock
test child finished with -1
---- end ----
daemon operations: FAILED!
irogers@alderlake:~$ perf test 76 -vv
Couldn't bump rlimit(MEMLOCK), failures may take place when creating
BPF maps, etc
76: perf list tests :
--- start ---
test child forked, pid 554167
Json output test
Expecting ',' delimiter: line 4971 column 2 (char 243497)
test child finished with -1
---- end ----
perf list tests: FAILED!
```
So I think this patch may be exposing other latent issues. I'll try to
take a look.

Another thought, rather than having an evlist validate we should just
assert the evlist is always in a good shape whenever it is mutated.
That would have avoided this bug as the code would have blown up
early.

Thanks,
Ian

[1] https://clang.llvm.org/extra/clang-tidy/checks/bugprone/argument-comment.html
[2] https://lore.kernel.org/lkml/20240116170348.463479-1-mark.rutland@xxxxxxx/

> return evlist;
> }


>
> --
> 2.34.1
>