[linus:master] [sched/eevdf] 2227a957e1: will-it-scale.per_process_ops 2.5% improvement

From: kernel test robot
Date: Mon Jan 29 2024 - 09:17:54 EST




Hello,

kernel test robot noticed a 2.5% improvement of will-it-scale.per_process_ops on:


commit: 2227a957e1d5b1941be4e4207879ec74f4bb37f8 ("sched/eevdf: Sort the rbtree by virtual deadline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

nr_task: 16
mode: process
test: sched_yield
cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 2.6% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=sched_yield |
+------------------+----------------------------------------------------------------------------------------------------+



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240129/202401292151.829b01b0-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/sched_yield/will-it-scale

commit:
84db47ca71 ("sched/numa: Fix mm numa_scan_seq based unconditional scan")
2227a957e1 ("sched/eevdf: Sort the rbtree by virtual deadline")

84db47ca7146d7bd 2227a957e1d5b1941be4e420787
---------------- ---------------------------
%stddev %change %stddev
\ | \
363.99 ±141% +104.2% 743.31 ± 69% numa-meminfo.node1.Inactive(file)
91.00 ±141% +104.2% 185.83 ± 69% numa-vmstat.node1.nr_inactive_file
91.00 ±141% +104.2% 185.83 ± 69% numa-vmstat.node1.nr_zone_inactive_file
16803184 +2.5% 17227597 will-it-scale.16.processes
1050198 +2.5% 1076724 will-it-scale.per_process_ops
16803184 +2.5% 17227597 will-it-scale.workload
1.70 ± 5% -12.0% 1.50 ± 4% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
1.72 ± 5% -11.7% 1.51 ± 4% perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
3.41 ± 5% -12.0% 3.00 ± 4% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
3.43 ± 5% -11.7% 3.03 ± 4% perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.18 +7.1% 0.19 perf-stat.i.MPKI
3.486e+09 -7.6% 3.222e+09 perf-stat.i.branch-instructions
1.34 +0.1 1.47 perf-stat.i.branch-miss-rate%
46582130 +1.6% 47319245 perf-stat.i.branch-misses
2.67 +8.3% 2.90 perf-stat.i.cpi
0.33 +0.0 0.36 perf-stat.i.dTLB-load-miss-rate%
18084714 +2.4% 18523285 perf-stat.i.dTLB-load-misses
5.491e+09 -5.2% 5.204e+09 perf-stat.i.dTLB-loads
3.036e+09 -1.1% 3.003e+09 perf-stat.i.dTLB-stores
741655 -4.3% 709869 ± 2% perf-stat.i.iTLB-loads
1.811e+10 -7.4% 1.677e+10 perf-stat.i.instructions
1115 -9.5% 1009 ± 5% perf-stat.i.instructions-per-iTLB-miss
0.38 -7.4% 0.35 perf-stat.i.ipc
115.51 -4.9% 109.88 perf-stat.i.metric.M/sec
0.21 ± 3% +7.6% 0.22 perf-stat.overall.MPKI
1.34 +0.1 1.47 perf-stat.overall.branch-miss-rate%
2.62 +8.0% 2.83 perf-stat.overall.cpi
0.33 +0.0 0.35 perf-stat.overall.dTLB-load-miss-rate%
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1032 -10.4% 925.55 ± 5% perf-stat.overall.instructions-per-iTLB-miss
0.38 -7.4% 0.35 perf-stat.overall.ipc
324242 -9.7% 292715 perf-stat.overall.path-length
3.474e+09 -7.6% 3.211e+09 perf-stat.ps.branch-instructions
46423565 +1.6% 47153977 perf-stat.ps.branch-misses
18023667 +2.4% 18460935 perf-stat.ps.dTLB-load-misses
5.473e+09 -5.2% 5.186e+09 perf-stat.ps.dTLB-loads
3.026e+09 -1.1% 2.993e+09 perf-stat.ps.dTLB-stores
739444 -4.3% 707693 ± 2% perf-stat.ps.iTLB-loads
1.805e+10 -7.4% 1.671e+10 perf-stat.ps.instructions
5.448e+12 -7.4% 5.043e+12 perf-stat.total.instructions
7.82 ± 2% -1.5 6.30 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
12.22 ± 2% -1.3 10.90 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.72 ± 2% -1.3 11.42 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
15.64 ± 2% -1.1 14.55 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.60 +0.0 0.64 ± 3% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.56 +0.0 0.61 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.28 ± 2% +0.2 2.44 ± 3% perf-profile.calltrace.cycles-pp.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
8.28 +0.3 8.54 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__sched_yield
0.09 ±223% +0.4 0.53 ± 3% perf-profile.calltrace.cycles-pp.update_min_vruntime.update_curr.pick_next_task_fair.__schedule.schedule
15.48 +0.5 16.01 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.00 +0.6 0.61 ± 3% perf-profile.calltrace.cycles-pp.pick_eevdf.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield
2.38 ± 2% -2.2 0.23 ± 4% perf-profile.children.cycles-pp.pick_next_entity
8.21 ± 2% -1.5 6.71 perf-profile.children.cycles-pp.pick_next_task_fair
12.32 ± 2% -1.3 11.01 perf-profile.children.cycles-pp.__schedule
12.75 ± 2% -1.3 11.46 perf-profile.children.cycles-pp.schedule
15.82 ± 2% -1.1 14.75 perf-profile.children.cycles-pp.__x64_sys_sched_yield
0.64 +0.0 0.68 ± 3% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.49 ± 4% +0.0 0.54 ± 3% perf-profile.children.cycles-pp.update_min_vruntime
1.19 ± 3% +0.1 1.28 perf-profile.children.cycles-pp._raw_spin_lock
2.34 ± 2% +0.2 2.51 ± 3% perf-profile.children.cycles-pp.do_sched_yield
8.17 +0.2 8.42 perf-profile.children.cycles-pp.entry_SYSCALL_64
15.76 +0.6 16.31 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.00 +0.6 0.63 ± 3% perf-profile.children.cycles-pp.pick_eevdf
0.55 ± 2% -0.5 0.05 ± 45% perf-profile.self.cycles-pp.pick_next_entity
0.44 ± 4% +0.0 0.48 ± 3% perf-profile.self.cycles-pp.update_min_vruntime
1.30 +0.1 1.36 perf-profile.self.cycles-pp.__sched_yield
1.46 ± 3% +0.1 1.53 ± 2% perf-profile.self.cycles-pp.__schedule
1.14 ± 2% +0.1 1.22 perf-profile.self.cycles-pp._raw_spin_lock
7.13 +0.2 7.33 perf-profile.self.cycles-pp.entry_SYSCALL_64
9.36 ± 2% +0.3 9.70 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
14.93 +0.5 15.47 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.00 +0.6 0.57 ± 3% perf-profile.self.cycles-pp.pick_eevdf


***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/sched_yield/will-it-scale

commit:
84db47ca71 ("sched/numa: Fix mm numa_scan_seq based unconditional scan")
2227a957e1 ("sched/eevdf: Sort the rbtree by virtual deadline")

84db47ca7146d7bd 2227a957e1d5b1941be4e420787
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.01 ± 33% +56.6% 0.01 ± 15% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
0.01 ± 13% +29.0% 0.01 ± 21% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
54153138 +2.6% 55542860 will-it-scale.52.processes
1041406 +2.6% 1068131 will-it-scale.per_process_ops
54153138 +2.6% 55542860 will-it-scale.workload
125729 ± 92% -58.8% 51829 ± 20% numa-meminfo.node0.Mapped
3437584 ± 28% -56.5% 1494873 ± 61% numa-meminfo.node0.MemUsed
1980318 ± 52% -66.7% 660255 ±131% numa-meminfo.node0.Unevictable
814154 ±127% +162.1% 2134179 ± 40% numa-meminfo.node1.Unevictable
31380 ± 91% -58.7% 12965 ± 20% numa-vmstat.node0.nr_mapped
495079 ± 52% -66.7% 165063 ±131% numa-vmstat.node0.nr_unevictable
495079 ± 52% -66.7% 165063 ±131% numa-vmstat.node0.nr_zone_unevictable
203538 ±127% +162.1% 533544 ± 40% numa-vmstat.node1.nr_unevictable
203538 ±127% +162.1% 533544 ± 40% numa-vmstat.node1.nr_zone_unevictable
8.82 -1.6 7.23 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
13.35 -1.5 11.86 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.86 -1.5 12.40 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
16.88 -1.4 15.52 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
2.48 ± 2% +0.1 2.56 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__sched_yield
8.35 +0.3 8.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__sched_yield
17.48 +0.5 17.96 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__sched_yield
0.00 +0.6 0.63 ± 3% perf-profile.calltrace.cycles-pp.pick_eevdf.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield
2.40 ± 3% -2.2 0.23 ± 5% perf-profile.children.cycles-pp.pick_next_entity
9.22 -1.6 7.65 perf-profile.children.cycles-pp.pick_next_task_fair
13.44 -1.5 11.96 perf-profile.children.cycles-pp.__schedule
13.89 -1.5 12.43 perf-profile.children.cycles-pp.schedule
17.07 -1.4 15.72 perf-profile.children.cycles-pp.__x64_sys_sched_yield
1.55 ± 2% +0.0 1.60 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
8.22 +0.3 8.48 perf-profile.children.cycles-pp.entry_SYSCALL_64
17.65 +0.5 18.12 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.7 0.66 ± 2% perf-profile.children.cycles-pp.pick_eevdf
0.57 ± 2% -0.5 0.06 ± 8% perf-profile.self.cycles-pp.pick_next_entity
1.16 ± 2% +0.0 1.19 perf-profile.self.cycles-pp._raw_spin_lock
7.17 +0.2 7.39 perf-profile.self.cycles-pp.entry_SYSCALL_64
9.54 +0.3 9.88 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
17.60 +0.5 18.07 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +0.6 0.60 ± 3% perf-profile.self.cycles-pp.pick_eevdf
1.099e+10 -7.9% 1.012e+10 perf-stat.i.branch-instructions
1.08 +0.1 1.21 perf-stat.i.branch-miss-rate%
1.192e+08 +2.2% 1.219e+08 perf-stat.i.branch-misses
2.61 +8.6% 2.83 perf-stat.i.cpi
0.33 +0.0 0.35 perf-stat.i.dTLB-load-miss-rate%
56475655 +2.1% 57669096 perf-stat.i.dTLB-load-misses
1.743e+10 -5.4% 1.649e+10 perf-stat.i.dTLB-loads
9.656e+09 -1.2% 9.536e+09 perf-stat.i.dTLB-stores
55897710 +3.6% 57909818 ± 3% perf-stat.i.iTLB-load-misses
5.716e+10 -7.7% 5.276e+10 perf-stat.i.instructions
1103 -10.6% 987.24 ± 3% perf-stat.i.instructions-per-iTLB-miss
0.39 -7.6% 0.36 perf-stat.i.ipc
366.15 -5.1% 347.53 perf-stat.i.metric.M/sec
1.08 +0.1 1.20 perf-stat.overall.branch-miss-rate%
2.56 +8.2% 2.77 perf-stat.overall.cpi
0.32 +0.0 0.35 perf-stat.overall.dTLB-load-miss-rate%
1022 -10.8% 912.21 ± 3% perf-stat.overall.instructions-per-iTLB-miss
0.39 -7.6% 0.36 perf-stat.overall.ipc
317393 -9.9% 286100 perf-stat.overall.path-length
1.096e+10 -7.9% 1.009e+10 perf-stat.ps.branch-instructions
1.188e+08 +2.2% 1.215e+08 perf-stat.ps.branch-misses
56295357 +2.1% 57482750 perf-stat.ps.dTLB-load-misses
1.738e+10 -5.4% 1.643e+10 perf-stat.ps.dTLB-loads
9.625e+09 -1.2% 9.505e+09 perf-stat.ps.dTLB-stores
55706724 +3.6% 57713872 ± 3% perf-stat.ps.iTLB-load-misses
5.698e+10 -7.7% 5.259e+10 perf-stat.ps.instructions
1.719e+13 -7.5% 1.589e+13 perf-stat.total.instructions



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki