Re: [PATCH] perf test: Retry without grouping for all metrics test

From: Ayush Jain
Date: Wed Jun 14 2023 - 07:39:45 EST


Hello Sandipan,

Thank you for this patch,

On 6/14/2023 2:37 PM, Sandipan Das wrote:
There are cases where a metric uses more events than the number of
counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
counters but the "nps1_die_to_dram" metric has eight events. By default,
the constituent events are placed in a group. Since the events cannot be
scheduled at the same time, the metric is not computed. The all metrics
test also fails because of this.

Before announcing failure, the test can try multiple options for each
available metric. After system-wide mode fails, retry once again with
the "--metric-no-group" option.

E.g.

$ sudo perf test -v 100

Before:

100: perf all metrics test :
--- start ---
test child forked, pid 672731
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Metric 'nps1_die_to_dram' not printed in:
Error:
Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with -1
---- end ----
perf all metrics test: FAILED!

After:

100: perf all metrics test :
--- start ---
test child forked, pid 672887
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with 0
---- end ----
perf all metrics test: Ok


Issue gets resolved after applying this patch

$ ./perf test 102 -vvv
$102: perf all metrics test :
$--- start ---
$test child forked, pid 244991
$Testing branch_misprediction_ratio
$Testing all_remote_links_outbound
$Testing nps1_die_to_dram
$Testing all_l2_cache_accesses
$Testing all_l2_cache_hits
$Testing all_l2_cache_misses
$Testing ic_fetch_miss_ratio
$Testing l2_cache_accesses_from_l2_hwpf
$Testing l2_cache_misses_from_l2_hwpf
$Testing l3_read_miss_latency
$Testing l1_itlb_misses
$test child finished with 0
$---- end ----
$perf all metrics test: Ok

Reported-by: Ayush Jain <ayush.jain3@xxxxxxx>
Signed-off-by: Sandipan Das <sandipan.das@xxxxxxx>

Tested-by: Ayush Jain <ayush.jain3@xxxxxxx>

---
tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
index 54774525e18a..1e88ea8c5677 100755
--- a/tools/perf/tests/shell/stat_all_metrics.sh
+++ b/tools/perf/tests/shell/stat_all_metrics.sh
@@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
then
continue
fi
+ # Failed again, possibly there are not enough counters so retry system wide
+ # mode but without event grouping.
+ result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
+ if [[ "$result" =~ ${m:0:50} ]]
+ then
+ continue
+ fi
# Failed again, possibly the workload was too small so retry with something
# longer.
result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)

Thanks & Regards,
Ayush Jain