Re: [PATCH 1/4] perf vendor events: Add core event list for Icelake Server

From: Jin, Yao
Date: Wed Jun 02 2021 - 20:56:23 EST


Hi Arnaldo,

On 6/2/2021 11:12 PM, Arnaldo Carvalho de Melo wrote:
Em Wed, Jun 02, 2021 at 09:55:49PM +0800, Jin, Yao escreveu:
Hi Arnaldo,

On 6/2/2021 7:26 PM, Arnaldo Carvalho de Melo wrote:
Em Tue, May 25, 2021 at 09:42:55AM -0300, Arnaldo Carvalho de Melo escreveu:
Em Mon, May 24, 2021 at 09:08:12AM +0800, Jin, Yao escreveu:
Could you pull the top 4 patches from "https://github.com/yaoj/icx-events.git";?

perf vendor events: Update event list for Icelake Client
perf vendor events: Add metrics for Icelake Server
perf vendor events: Add uncore event list for Icelake Server

The patch is too big and it's possibly corrupted by mailing system.
Thanks, applied.

So, this is failing 'perf test 10', see details below, please run 'perf
test' before pushing patches upstream.

Triple checking:

⬢[acme@toolbox perf]$ git cherry-pick 8f74f0f4dbf6361f0a5d21c5da260fbbf7597286
Removing tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
[perf/core 6971d24f4d04ccfa] Revert "perf vendor events intel: Add metrics for Icelake Server"
Date: Wed Jun 2 08:16:20 2021 -0300
1 file changed, 327 deletions(-)
delete mode 100644 tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
⬢[acme@toolbox perf]$ git log --oneline -1
6971d24f4d04ccfa (HEAD -> perf/core) Revert "perf vendor events intel: Add metrics for Icelake Server"
⬢[acme@toolbox perf]$ (rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -k CORESIGHT=1 BUILD_BPF_SKEL=1 PYTHON=python3 O=/tmp/build/perf -C tools/perf install-bin) > /dev/null 2>&1 ; perf test 10
10: PMU events :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
⬢[acme@toolbox perf]$ git reset --hard HEAD~
HEAD is now at 0ab8009b3e8dd6ba Merge remote-tracking branch 'torvalds/master' into perf/core
⬢[acme@toolbox perf]$ (rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -k CORESIGHT=1 BUILD_BPF_SKEL=1 PYTHON=python3 O=/tmp/build/perf -C tools/perf install-bin) > /dev/null 2>&1 ; perf test 10
10: PMU events :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : FAILED!
⬢[acme@toolbox perf]$

- Arnaldo

⬢[acme@toolbox perf]$ git bisect bad
d89bf9cab1f613e4496f929d89477b2baaad1ea9 is the first bad commit
commit d89bf9cab1f613e4496f929d89477b2baaad1ea9
Author: Jin Yao <yao.jin@xxxxxxxxxxxxxxx>
Date: Sat May 8 13:06:20 2021 +0800

perf vendor events intel: Add metrics for Icelake Server

Add JSON metrics for Icelake Server to perf.

Based on TMA metrics 4.21 at 01.org.:

https://download.01.org/perfmon/

Signed-off-by: Jin Yao <yao.jin@xxxxxxxxxxxxxxx>
Reviewed-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Acked-by: Ian Rogers <irogers@xxxxxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Kan Liang <kan.liang@xxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Link: http://lore.kernel.org/lkml/c0f27643-bebb-2912-56ed-f7abec7dbde3@xxxxxxxxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>

.../pmu-events/arch/x86/icelakex/icx-metrics.json | 327 +++++++++++++++++++++
1 file changed, 327 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
⬢[acme@toolbox perf]$


⬢[acme@toolbox perf]$ perf test -v 10 |& tail -40
parsing 'inst_retired.any / cpu_clk_unhalted.distributed'
parsing '( 1 * ( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2 * fp_arith_inst_retired.128b_packed_double + 4 * ( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8 * ( fp_arith_inst_retired.256b_packed_single + fp_arith_inst_retired.512b_packed_double ) + 16 * fp_arith_inst_retired.512b_packed_single ) / cpu_clk_unhalted.distributed'
parsing 'uops_executed.thread / ( uops_executed.core_cycles_ge_1 / 2 )'
parsing 'cpu_clk_unhalted.distributed'
parsing 'inst_retired.any / mem_inst_retired.all_loads'
parsing 'inst_retired.any / mem_inst_retired.all_stores'
parsing 'inst_retired.any / br_inst_retired.all_branches'
parsing 'inst_retired.any / br_inst_retired.near_call'
parsing 'br_inst_retired.all_branches / br_inst_retired.near_taken'
parsing 'inst_retired.any / ( 1 * ( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2 * fp_arith_inst_retired.128b_packed_double + 4 * ( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8 * ( fp_arith_inst_retired.256b_packed_single + fp_arith_inst_retired.512b_packed_double ) + 16 * fp_arith_inst_retired.512b_packed_single )'
parsing 'inst_retired.any'
parsing 'lsd.uops / (idq.dsb_uops + lsd.uops + idq.mite_uops + idq.ms_uops)'
parsing 'idq.dsb_uops / (idq.dsb_uops + lsd.uops + idq.mite_uops + idq.ms_uops)'
parsing 'l1d_pend_miss.pending / ( mem_load_retired.l1_miss + mem_load_retired.fb_hit )'
parsing 'l1d_pend_miss.pending / l1d_pend_miss.pending_cycles'
parsing '( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending ) / ( 2 * cpu_clk_unhalted.distributed )'
parsing '64 * l1d.replacement / 1000000000 / duration_time'
parsing '64 * l2_lines_in.all / 1000000000 / duration_time'
parsing '64 * longest_lat_cache.miss / 1000000000 / duration_time'
parsing '64 * offcore_requests.all_requests / 1000000000 / duration_time'
parsing '1000 * mem_load_retired.l1_miss / inst_retired.any'
parsing '1000 * mem_load_retired.l2_miss / inst_retired.any'
parsing '1000 * ( ( offcore_requests.all_data_rd - offcore_requests.demand_data_rd ) + l2_rqsts.all_demand_miss + l2_rqsts.swpf_miss ) / inst_retired.any'
parsing '1000 * mem_load_retired.l3_miss / inst_retired.any'
parsing '1000 * l2_lines_out.silent / inst_retired.any'
parsing '1000 * l2_lines_out.non_silent / inst_retired.any'
parsing 'cpu_clk_unhalted.ref_tsc / msr@tsc@'
parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@ / 1000000000 / duration_time'
parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2 * fp_arith_inst_retired.128b_packed_double + 4 * ( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8 * ( fp_arith_inst_retired.256b_packed_single + fp_arith_inst_retired.512b_packed_double ) + 16 * fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
parsing '1 - cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_distributed'
parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
parsing '( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ / cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ / duration_time )'
parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ / cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
parsing '( 1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm / cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@ )'
check_parse_fake failed
test child finished with -1
---- end ----
PMU events subtest 4: FAILED!
⬢[acme@toolbox perf]$


Very sorry about the "Parsing of PMU event table metrics with fake PMUs"
failure! I will resubmit the patch also with other c-state metrics update.

So have you figure out what was wrong from the verbose output above?

- Arnaldo


Yes, thanks Arnaldo!

The issue was at 'config\=0x40433@_pmm' in MetricExpr of "MEM_PMM_Read_Latency". Now I'm thinking to use more direct MetricExpr for "MEM_PMM_Read_Latency", such as.

{
"MetricExpr": "( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PMM ) / cha_0@event\\=0x0@ )",
"BriefDescription": "Average latency of data read request to external 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-read prefetches",
"MetricGroup": "MemoryLat;SoC;Server",
"MetricName": "MEM_PMM_Read_Latency"
},

Now the test is passed.

Thanks
Jin Yao