Re: [PATCH] perf test: Retry without grouping for all metrics test

From: Sandipan Das
Date: Mon Jun 19 2023 - 07:46:49 EST


Hi Ian,

On 6/14/2023 10:10 PM, Ian Rogers wrote:
> On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <sandipan.das@xxxxxxx> wrote:
>>
>> There are cases where a metric uses more events than the number of
>> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
>> counters but the "nps1_die_to_dram" metric has eight events. By default,
>> the constituent events are placed in a group. Since the events cannot be
>> scheduled at the same time, the metric is not computed. The all metrics
>> test also fails because of this.
>
> Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
> driver. When the events are added the driver should create a fake PMU,
> check that adding the group is valid and if not fail. The failure is
> picked up by the tool and it will remove the group.
>
> I appreciate the need for a time machine to make such a fix work. To
> workaround the issue with the metrics add:
> "MetricConstraint": "NO_GROUP_EVENTS",
> to each metric in the json.
>

Thanks for the suggestions. The amd_uncore driver is indeed missing group
validation checks during event init. Will send out a fix with the
"NO_GROUP_EVENTS" workaround.

>> Before announcing failure, the test can try multiple options for each
>> available metric. After system-wide mode fails, retry once again with
>> the "--metric-no-group" option.
>>
>> E.g.
>>
>> $ sudo perf test -v 100
>>
>> Before:
>>
>> 100: perf all metrics test :
>> --- start ---
>> test child forked, pid 672731
>> Testing branch_misprediction_ratio
>> Testing all_remote_links_outbound
>> Testing nps1_die_to_dram
>> Metric 'nps1_die_to_dram' not printed in:
>> Error:
>> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
>
> This error doesn't relate to grouping, so I'm confused about having it
> in the commit message, aside from the test failure.
>

Agreed. That's the error message from the last attempt where the test
tries to use a longer running workload (perf bench).

- Sandipan