[linus:master] [block] 53889bcaf5: stress-ng.ioprio.ops_per_sec 13.0% improvement

From: kernel test robot
Date: Wed Jan 31 2024 - 09:09:33 EST




Hello,

kernel test robot noticed a 13.0% improvement of stress-ng.ioprio.ops_per_sec on:


commit: 53889bcaf536b3abedeaf104019877cee37dd08b ("block: make __get_task_ioprio() easier to read")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: btrfs
test: ioprio
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240131/202401311609.2c8c0628-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/btrfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/ioprio/stress-ng/60s

commit:
3b7cb74547 ("block: move __get_task_ioprio() into header file")
53889bcaf5 ("block: make __get_task_ioprio() easier to read")

3b7cb745473aec72 53889bcaf536b3abedeaf104019
---------------- ---------------------------
%stddev %change %stddev
\ | \
31039 ± 50% +62.9% 50565 ± 30% numa-vmstat.node1.nr_anon_pages
0.01 ± 20% -35.7% 0.00 ± 21% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
70.30 -3.2% 68.07 turbostat.RAMWatt
14275 ±109% +173.4% 39022 ± 46% numa-meminfo.node1.AnonHugePages
124137 ± 50% +62.9% 202248 ± 30% numa-meminfo.node1.AnonPages
111.17 ± 16% -68.2% 35.33 ± 27% perf-c2c.DRAM.local
221.33 ± 5% -56.2% 97.00 ± 10% perf-c2c.DRAM.remote
524510 ± 8% +25.5% 658460 ± 10% sched_debug.cpu.max_idle_balance_cost.max
3643 ±165% +494.7% 21671 ± 34% sched_debug.cpu.max_idle_balance_cost.stddev
4756555 +13.0% 5374750 stress-ng.ioprio.ops
79272 +13.0% 89575 stress-ng.ioprio.ops_per_sec
4.52 -31.6% 3.09 ± 8% perf-stat.i.MPKI
3.514e+09 +6.2% 3.734e+09 perf-stat.i.branch-instructions
0.31 ± 6% -0.1 0.25 ± 7% perf-stat.i.branch-miss-rate%
12495630 ± 4% -12.9% 10889062 ± 6% perf-stat.i.branch-misses
9.01 -2.0 7.03 ± 8% perf-stat.i.cache-miss-rate%
73180840 -32.5% 49400992 ± 8% perf-stat.i.cache-misses
8.118e+08 -13.4% 7.029e+08 perf-stat.i.cache-references
1.41 +1.6% 1.43 perf-stat.i.cpi
328.29 ± 3% +46.7% 481.44 ± 6% perf-stat.i.cycles-between-cache-misses
3.862e+09 +6.8% 4.123e+09 perf-stat.i.dTLB-loads
0.00 -0.0 0.00 ± 4% perf-stat.i.dTLB-store-miss-rate%
1.695e+09 +11.5% 1.891e+09 perf-stat.i.dTLB-stores
1.658e+10 -1.8% 1.628e+10 perf-stat.i.instructions
154.35 +5.7% 163.21 perf-stat.i.metric.M/sec
7769868 ± 5% -66.9% 2575037 ± 8% perf-stat.i.node-load-misses
1891321 ± 12% -67.4% 616722 ± 38% perf-stat.i.node-loads
4.42 -31.3% 3.03 ± 7% perf-stat.overall.MPKI
0.36 ± 5% -0.1 0.29 ± 6% perf-stat.overall.branch-miss-rate%
9.02 -2.0 7.03 ± 8% perf-stat.overall.cache-miss-rate%
1.39 +1.6% 1.42 perf-stat.overall.cpi
315.95 +48.7% 469.87 ± 7% perf-stat.overall.cycles-between-cache-misses
0.00 -0.0 0.00 ± 4% perf-stat.overall.dTLB-store-miss-rate%
0.72 -1.6% 0.71 perf-stat.overall.ipc
3.455e+09 +6.2% 3.671e+09 perf-stat.ps.branch-instructions
12275780 ± 4% -12.8% 10699106 ± 6% perf-stat.ps.branch-misses
71968656 -32.5% 48582093 ± 8% perf-stat.ps.cache-misses
7.982e+08 -13.4% 6.912e+08 perf-stat.ps.cache-references
3.797e+09 +6.8% 4.054e+09 perf-stat.ps.dTLB-loads
1.667e+09 +11.5% 1.859e+09 perf-stat.ps.dTLB-stores
1.63e+10 -1.8% 1.601e+10 perf-stat.ps.instructions
7640492 ± 5% -66.9% 2532136 ± 8% perf-stat.ps.node-load-misses
1859653 ± 12% -67.4% 606316 ± 38% perf-stat.ps.node-loads
9.886e+11 -1.8% 9.707e+11 perf-stat.total.instructions
0.59 ± 2% +0.0 0.63 ± 2% perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.do_iter_readv_writev.do_iter_write.vfs_writev
0.63 ± 2% +0.0 0.68 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fsync
0.62 ± 4% +0.0 0.68 ± 4% perf-profile.calltrace.cycles-pp.import_iovec.vfs_writev.__x64_sys_pwritev.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.13 +0.1 1.18 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
1.52 +0.1 1.62 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.do_iter_readv_writev
1.59 +0.1 1.69 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.do_iter_readv_writev.do_iter_write
1.10 ± 3% +0.1 1.22 ± 2% perf-profile.calltrace.cycles-pp.security_task_getioprio.__do_sys_ioprio_get.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
1.39 ± 2% +0.1 1.52 ± 3% perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.do_iter_readv_writev
1.60 ± 2% +0.1 1.74 ± 2% perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.do_iter_readv_writev.do_iter_write
0.35 ± 70% +0.2 0.54 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
1.60 ± 3% +0.2 1.79 ± 4% perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.do_iter_readv_writev.do_iter_write
7.02 +0.6 7.62 ± 2% perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.do_iter_readv_writev.do_iter_write.vfs_writev
8.19 +0.7 8.89 ± 2% perf-profile.calltrace.cycles-pp.generic_file_write_iter.do_iter_readv_writev.do_iter_write.vfs_writev.__x64_sys_pwritev
8.54 +0.7 9.26 ± 2% perf-profile.calltrace.cycles-pp.do_iter_readv_writev.do_iter_write.vfs_writev.__x64_sys_pwritev.do_syscall_64
9.20 +0.8 9.95 ± 2% perf-profile.calltrace.cycles-pp.do_iter_write.vfs_writev.__x64_sys_pwritev.do_syscall_64.entry_SYSCALL_64_after_hwframe
52.90 ± 2% +6.3 59.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.get_task_ioprio.__do_sys_ioprio_get.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.20 ± 3% +0.0 0.22 ± 2% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.14 ± 7% +0.0 0.18 ± 13% perf-profile.children.cycles-pp.up_write
0.30 ± 5% +0.0 0.33 ± 5% perf-profile.children.cycles-pp.__fsnotify_parent
0.42 ± 2% +0.0 0.46 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.61 ± 2% +0.0 0.66 ± 3% perf-profile.children.cycles-pp.__generic_file_write_iter
1.15 +0.1 1.20 ± 2% perf-profile.children.cycles-pp.filemap_get_entry
0.32 ± 4% +0.1 0.38 ± 8% perf-profile.children.cycles-pp.__radix_tree_lookup
0.90 ± 2% +0.1 0.97 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.42 ± 4% +0.1 0.49 ± 6% perf-profile.children.cycles-pp.find_task_by_vpid
1.56 +0.1 1.64 perf-profile.children.cycles-pp.__filemap_get_folio
1.60 +0.1 1.70 perf-profile.children.cycles-pp.simple_write_begin
1.49 ± 2% +0.1 1.62 ± 3% perf-profile.children.cycles-pp.fault_in_readable
1.64 ± 2% +0.2 1.80 ± 3% perf-profile.children.cycles-pp.fault_in_iov_iter_readable
1.45 ± 3% +0.2 1.63 perf-profile.children.cycles-pp.security_task_getioprio
1.61 ± 3% +0.2 1.80 ± 4% perf-profile.children.cycles-pp.copy_page_from_iter_atomic
7.08 +0.6 7.68 ± 2% perf-profile.children.cycles-pp.generic_perform_write
8.22 +0.7 8.92 ± 2% perf-profile.children.cycles-pp.generic_file_write_iter
8.56 +0.7 9.27 ± 2% perf-profile.children.cycles-pp.do_iter_readv_writev
9.21 +0.8 9.97 ± 2% perf-profile.children.cycles-pp.do_iter_write
62.57 +1.3 63.83 perf-profile.children.cycles-pp.get_task_ioprio
53.95 ± 2% +6.2 60.16 perf-profile.children.cycles-pp._raw_spin_lock
9.16 ± 5% -5.0 4.15 ± 2% perf-profile.self.cycles-pp.get_task_ioprio
9.84 ± 9% -2.2 7.62 ± 4% perf-profile.self.cycles-pp.__do_sys_ioprio_get
0.16 ± 5% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.18 ± 2% +0.0 0.21 ± 5% perf-profile.self.cycles-pp.fault_in_iov_iter_readable
0.49 +0.0 0.52 ± 2% perf-profile.self.cycles-pp.filemap_get_entry
0.29 ± 6% +0.0 0.33 ± 6% perf-profile.self.cycles-pp.__fsnotify_parent
0.88 ± 2% +0.1 0.93 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.31 ± 3% +0.1 0.37 ± 8% perf-profile.self.cycles-pp.__radix_tree_lookup
1.08 ± 4% +0.1 1.21 perf-profile.self.cycles-pp.security_task_getioprio
1.44 ± 2% +0.1 1.57 ± 3% perf-profile.self.cycles-pp.fault_in_readable
1.60 ± 3% +0.2 1.79 ± 4% perf-profile.self.cycles-pp.copy_page_from_iter_atomic
48.20 ± 4% +7.5 55.66 perf-profile.self.cycles-pp._raw_spin_lock




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki