Re: [PATCH v2] perf stat: Introduce skippable evsels

From: Ian Rogers
Date: Fri Apr 21 2023 - 11:59:05 EST


On Fri, Apr 21, 2023 at 6:32 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2023-04-20 8:19 p.m., Ian Rogers wrote:
> >>>> struct evlist *metric_evlist = evlist__new();
> >>>> struct evsel *metric_evsel;
> >>>>
> >>>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> >>>> index 6b46bbb3d322..072fa56744b4 100644
> >>>> --- a/tools/perf/util/stat-display.c
> >>>> +++ b/tools/perf/util/stat-display.c
> >>>> @@ -747,7 +747,7 @@ static void uniquify_event_name(struct evsel *counter)
> >>>> int ret = 0;
> >>>>
> >>>> if (counter->uniquified_name || counter->use_config_name ||
> >>>> - !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
> >>>> + !counter->pmu_name || !counter->name ||
> >>>> !strncmp(counter->name, counter->pmu_name,
> >>>> strlen(counter->pmu_name)))
> >>>> return;
> >>>
> >>> Is this a pre-existing hybrid bug? It is a real shame hybrid shows so
> >>> few common code paths. In general evsel__name should be preferred over
> >>> directly accessing name.
> >>
> >>
> >> I don't think so.
> >>
> >> I haven't dig into the bug yet. But from the source code I can tell that
> >> the check is the same as the current 6.3-rc7.
> >>
> >> For the current 6.3-rc7, perf stat true works.
> >> The perf stat -M TopdownL1 --metric-no-group can work as well.
> >>
> >> But with the current perf-tools-next branch, perf stat true gives a
> >> Segmentation fault.
> >>
> >> The TopdownL1 doesn't work either.
> >>
> >> # ./perf stat -M TopdownL1 --metric-no-group
> >> Error:
> >> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> >> for event (topdown-retiring).
> >> /bin/dmesg | grep -i perf may provide additional information.
> >
> > I see hybrid failing basic sanity tests both for 6.3 and in
> > perf-tools-next. For metrics I see:
> >
> > ```
> > $ git status
> > ...
> > Your branch is up to date with 'linus/master'
> > ...
> > $ git describe
> > v6.3-rc7-139-gb7bc77e2f2c7
> > $ sudo perf stat -M TopdownL1 -a sleep 1
>
> Try the --metric-no-group.

You can't not group topdown events, needed for the performance core,
on Linux 6.3. The logic to make that not necessary was added to
perf-tools-next and you are currently looking to revert the changes in
perf-tools-next.
https://lore.kernel.org/all/20230312021543.3060328-1-irogers@xxxxxxxxxx/

Thanks,
Ian

>
>
> > WARNING: events in group from different hybrid PMUs!
> > WARNING: grouped events cpus do not match, disabling group:
> > anon group { topdown-retiring, topdown-retiring,
> > INT_MISC.UOP_DROPPING, topdown-fe-bound, topdown-fe-bound,
> > CPU_CLK_UNHALTED.CORE, topdown-be-bound, topdown-be-bound,
> > topdown-bad-spec, topdown-bad-spec }
> > Error:
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > for event (topdown-retiring).
> > /bin/dmesg | grep -i perf may provide additional information.
> > ```
> >
> > It seems perf on hybrid is quite broken in 6.3, but I doubt we can fix
> > 6.3 given the late stage of the release cycle. As perf-tools-next
> > enables TopdownL1 metrics when no events or metric are specified and
> > when the metric group is present, on hybrid this will cause the
> > pre-existing bug to appear for the no events/metrics case. I suspect
> > this is the cause of the crashes you see, but I'm seeing assertion
> > failures and similar as I'm using a debug build.
> >
> > I'm looking into fixing perf-tools-next and not 6.3. Maybe there will
> > be something we can cherry-pick back to fix up 6.3. It hasn't been
> > easy to find hardware to test on, and if the machine I'm remotely
> > using falls over then I have no means to test, so fingers crossed.
> >
>
> OK. So the json metric thing is buggy on both 6.3 and perf-tools-next. I
> think it is even worse in the perf-tools-next.
> Besides the bugs, the json metric also changes the output layout of perf
> stat default. The tools, which are on top of perf, are all impacted.
>
> The question is why we are in such a rush to move the default of perf
> stat from reliable kernel metric to json metric?
>
> Can we take a step back? Create a perf/json_metric or whatever branch,
> fix all the issues thoroughly, and then merge to the mainline.
>
> I think the default of perf stat is frequently used by not only the
> newbees but also the veterans. That could have a big user-visible impact.
>
> The 6.4 merge window is approaching. Can we at least revert the patches
> for 6.4?
>
> Arnaldo, what do you think?
>
> Thanks,
> Kan