[linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression

From: kernel test robot
Date: Mon Nov 20 2023 - 02:11:53 EST




Hello,

kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops on:


commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to SLAB_TYPESAFE_BY_RCU")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: will-it-scale
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

nr_task: 16
mode: thread
test: poll2
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231120/202311201406.2022ca3f-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale

commit:
93faf426e3 ("vfs: shave work on failed file open")
0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU")

93faf426e3cc000c 0ede61d8589cc2d93aa78230d74
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.01 ± 9% +58125.6% 4.17 ±175% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
89056 -2.0% 87309 proc-vmstat.nr_slab_unreclaimable
97958 ± 7% -9.7% 88449 ± 4% sched_debug.cpu.avg_idle.stddev
0.00 ± 12% +24.2% 0.00 ± 17% sched_debug.cpu.next_balance.stddev
6391048 -2.9% 6208584 will-it-scale.16.threads
399440 -2.9% 388036 will-it-scale.per_thread_ops
6391048 -2.9% 6208584 will-it-scale.workload
19.99 ± 4% -2.2 17.74 perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
1.27 ± 5% +0.8 2.11 ± 3% perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
32.69 ± 4% +5.0 37.70 perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
0.00 +27.9 27.85 perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll
20.00 ± 4% -2.3 17.75 perf-profile.children.cycles-pp.fput
0.24 ± 10% -0.1 0.18 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
1.48 ± 5% +0.5 1.98 ± 3% perf-profile.children.cycles-pp.__fdget
31.85 ± 4% +6.0 37.86 perf-profile.children.cycles-pp.__fget_light
0.00 +27.7 27.67 perf-profile.children.cycles-pp.__get_file_rcu
30.90 ± 4% -20.6 10.35 ± 2% perf-profile.self.cycles-pp.__fget_light
19.94 ± 4% -2.4 17.53 perf-profile.self.cycles-pp.fput
9.81 ± 4% -2.4 7.42 ± 2% perf-profile.self.cycles-pp.do_poll
0.23 ± 11% -0.1 0.17 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +26.5 26.48 perf-profile.self.cycles-pp.__get_file_rcu
2.146e+10 ± 2% +8.5% 2.329e+10 ± 2% perf-stat.i.branch-instructions
0.22 ± 14% -0.0 0.19 ± 14% perf-stat.i.branch-miss-rate%
1.404e+10 ± 2% +8.7% 1.526e+10 ± 2% perf-stat.i.dTLB-stores
70.87 -2.3 68.59 perf-stat.i.iTLB-load-miss-rate%
5267608 -5.5% 4979133 ± 2% perf-stat.i.iTLB-load-misses
2102507 +5.4% 2215725 perf-stat.i.iTLB-loads
18791 ± 3% +10.5% 20757 ± 2% perf-stat.i.instructions-per-iTLB-miss
266.67 ± 2% +6.8% 284.75 ± 2% perf-stat.i.metric.M/sec
0.01 ± 10% -10.5% 0.01 ± 5% perf-stat.overall.MPKI
0.19 -0.0 0.17 perf-stat.overall.branch-miss-rate%
0.65 -3.1% 0.63 perf-stat.overall.cpi
0.00 ± 4% -0.0 0.00 ± 4% perf-stat.overall.dTLB-store-miss-rate%
71.48 -2.3 69.21 perf-stat.overall.iTLB-load-miss-rate%
18757 +10.0% 20629 perf-stat.overall.instructions-per-iTLB-miss
1.54 +3.2% 1.59 perf-stat.overall.ipc
4795147 +6.4% 5100406 perf-stat.overall.path-length
2.14e+10 ± 2% +8.5% 2.322e+10 ± 2% perf-stat.ps.branch-instructions
1.4e+10 ± 2% +8.7% 1.522e+10 ± 2% perf-stat.ps.dTLB-stores
5253923 -5.5% 4966218 ± 2% perf-stat.ps.iTLB-load-misses
2095770 +5.4% 2208605 perf-stat.ps.iTLB-loads
3.065e+13 +3.3% 3.167e+13 perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki