Re: [PATCH v2] perf stat: Introduce skippable evsels

From: Liang, Kan
Date: Fri Apr 21 2023 - 09:32:31 EST




On 2023-04-20 8:19 p.m., Ian Rogers wrote:
>>>> struct evlist *metric_evlist = evlist__new();
>>>> struct evsel *metric_evsel;
>>>>
>>>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>>>> index 6b46bbb3d322..072fa56744b4 100644
>>>> --- a/tools/perf/util/stat-display.c
>>>> +++ b/tools/perf/util/stat-display.c
>>>> @@ -747,7 +747,7 @@ static void uniquify_event_name(struct evsel *counter)
>>>> int ret = 0;
>>>>
>>>> if (counter->uniquified_name || counter->use_config_name ||
>>>> - !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
>>>> + !counter->pmu_name || !counter->name ||
>>>> !strncmp(counter->name, counter->pmu_name,
>>>> strlen(counter->pmu_name)))
>>>> return;
>>>
>>> Is this a pre-existing hybrid bug? It is a real shame hybrid shows so
>>> few common code paths. In general evsel__name should be preferred over
>>> directly accessing name.
>>
>>
>> I don't think so.
>>
>> I haven't dig into the bug yet. But from the source code I can tell that
>> the check is the same as the current 6.3-rc7.
>>
>> For the current 6.3-rc7, perf stat true works.
>> The perf stat -M TopdownL1 --metric-no-group can work as well.
>>
>> But with the current perf-tools-next branch, perf stat true gives a
>> Segmentation fault.
>>
>> The TopdownL1 doesn't work either.
>>
>> # ./perf stat -M TopdownL1 --metric-no-group
>> Error:
>> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
>> for event (topdown-retiring).
>> /bin/dmesg | grep -i perf may provide additional information.
>
> I see hybrid failing basic sanity tests both for 6.3 and in
> perf-tools-next. For metrics I see:
>
> ```
> $ git status
> ...
> Your branch is up to date with 'linus/master'
> ...
> $ git describe
> v6.3-rc7-139-gb7bc77e2f2c7
> $ sudo perf stat -M TopdownL1 -a sleep 1

Try the --metric-no-group.


> WARNING: events in group from different hybrid PMUs!
> WARNING: grouped events cpus do not match, disabling group:
> anon group { topdown-retiring, topdown-retiring,
> INT_MISC.UOP_DROPPING, topdown-fe-bound, topdown-fe-bound,
> CPU_CLK_UNHALTED.CORE, topdown-be-bound, topdown-be-bound,
> topdown-bad-spec, topdown-bad-spec }
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (topdown-retiring).
> /bin/dmesg | grep -i perf may provide additional information.
> ```
>
> It seems perf on hybrid is quite broken in 6.3, but I doubt we can fix
> 6.3 given the late stage of the release cycle. As perf-tools-next
> enables TopdownL1 metrics when no events or metric are specified and
> when the metric group is present, on hybrid this will cause the
> pre-existing bug to appear for the no events/metrics case. I suspect
> this is the cause of the crashes you see, but I'm seeing assertion
> failures and similar as I'm using a debug build.
>
> I'm looking into fixing perf-tools-next and not 6.3. Maybe there will
> be something we can cherry-pick back to fix up 6.3. It hasn't been
> easy to find hardware to test on, and if the machine I'm remotely
> using falls over then I have no means to test, so fingers crossed.
>

OK. So the json metric thing is buggy on both 6.3 and perf-tools-next. I
think it is even worse in the perf-tools-next.
Besides the bugs, the json metric also changes the output layout of perf
stat default. The tools, which are on top of perf, are all impacted.

The question is why we are in such a rush to move the default of perf
stat from reliable kernel metric to json metric?

Can we take a step back? Create a perf/json_metric or whatever branch,
fix all the issues thoroughly, and then merge to the mainline.

I think the default of perf stat is frequently used by not only the
newbees but also the veterans. That could have a big user-visible impact.

The 6.4 merge window is approaching. Can we at least revert the patches
for 6.4?

Arnaldo, what do you think?

Thanks,
Kan