Re: [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine

From: Xing Zhengjun
Date: Thu Jun 09 2022 - 06:42:12 EST




On 6/9/2022 8:09 AM, Namhyung Kim wrote:
On Tue, Jun 7, 2022 at 1:08 AM <zhengjun.xing@xxxxxxxxxxxxxxx> wrote:

From: Zhengjun Xing <zhengjun.xing@xxxxxxxxxxxxxxx>

Topdown metrics are missed in the default perf stat on the hybrid machine,
add Topdown metrics in default perf stat for hybrid systems.

Currently, we support the perf metrics Topdown for the p-core PMU in the
perf stat default, the perf metrics Topdown support for e-core PMU will be
implemented later separately. Refactor the code adds two x86 specific
functions. Widen the size of the event name column by 7 chars, so that all
metrics after the "#" become aligned again.

The perf metrics topdown feature is supported on the cpu_core of ADL. The
dedicated perf metrics counter and the fixed counter 3 are used for the
topdown events. Adding the topdown metrics doesn't trigger multiplexing.

Before:

# ./perf stat -a true

Performance counter stats for 'system wide':

53.70 msec cpu-clock # 25.736 CPUs utilized
80 context-switches # 1.490 K/sec
24 cpu-migrations # 446.951 /sec
52 page-faults # 968.394 /sec
2,788,555 cpu_core/cycles/ # 51.931 M/sec
851,129 cpu_atom/cycles/ # 15.851 M/sec
2,974,030 cpu_core/instructions/ # 55.385 M/sec
416,919 cpu_atom/instructions/ # 7.764 M/sec
586,136 cpu_core/branches/ # 10.916 M/sec
79,872 cpu_atom/branches/ # 1.487 M/sec
14,220 cpu_core/branch-misses/ # 264.819 K/sec
7,691 cpu_atom/branch-misses/ # 143.229 K/sec

0.002086438 seconds time elapsed

After:

# ./perf stat -a true

Performance counter stats for 'system wide':

61.39 msec cpu-clock # 24.874 CPUs utilized
76 context-switches # 1.238 K/sec
24 cpu-migrations # 390.968 /sec
52 page-faults # 847.097 /sec
2,753,695 cpu_core/cycles/ # 44.859 M/sec
903,899 cpu_atom/cycles/ # 14.725 M/sec
2,927,529 cpu_core/instructions/ # 47.690 M/sec
428,498 cpu_atom/instructions/ # 6.980 M/sec
581,299 cpu_core/branches/ # 9.470 M/sec
83,409 cpu_atom/branches/ # 1.359 M/sec
13,641 cpu_core/branch-misses/ # 222.216 K/sec
8,008 cpu_atom/branch-misses/ # 130.453 K/sec
14,761,308 cpu_core/slots/ # 240.466 M/sec
3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring
1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation
5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound
4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound
646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations
1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears
3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth
1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound

0.002467839 seconds time elapsed

Signed-off-by: Zhengjun Xing <zhengjun.xing@xxxxxxxxxxxxxxx>
Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
---
[SNIP]
+const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
+{
+ const char *pmu_name = "cpu";
+
+ if (perf_pmu__has_hybrid()) {
+ if (!evlist->hybrid_pmu_name) {
+ if (warn)
+ pr_warning
+ ("WARNING: default to use cpu_core topdown events\n");
+ evlist->hybrid_pmu_name =
+ perf_pmu__hybrid_type_to_pmu("core");

This doesn't look good. Please consider reducing the
indent level like returning early as

if (!perf_pmu__has_hybrid())
return "cpu";

if (!evlist->hybrid_pmu_name) {
...

Thanks for the comments, I will update it in the next version.
Thanks,
Namhyung


+ }
+
+ pmu_name = evlist->hybrid_pmu_name;
+ }
+ return pmu_name;
+}

--
Zhengjun Xing