[linus:master] [workqueue] 636b927eba: stress-ng.io.ops_per_sec 19.5% improvement

From: kernel test robot
Date: Fri Sep 22 2023 - 05:38:29 EST




Hello,

kernel test robot noticed a 19.5% improvement of stress-ng.io.ops_per_sec on:


commit: 636b927eba5bc633753f8eb80f35e1d5be806e51 ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory
parameters:

nr_threads: 10%
disk: 1SSD
testtime: 60s
fs: xfs
class: filesystem
test: io
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230922/202309221737.2ee51a68-oliver.sang@xxxxxxxxx

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
filesystem/gcc-12/performance/1SSD/xfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-skl-d08/io/stress-ng/60s

commit:
4cbfd3de73 ("workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug")
636b927eba ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")

4cbfd3de737b9d00 636b927eba5bc633753f8eb80f3
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.53 ± 2% +0.3 1.82 ± 3% mpstat.cpu.all.usr%
0.04 ± 25% -58.2% 0.02 ± 43% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork
7.29 -2.7% 7.09 iostat.cpu.system
1.52 ± 2% +18.2% 1.80 ± 3% iostat.cpu.user
58.72 ± 46% -63.9% 21.18 ± 50% sched_debug.cfs_rq:/.removed.load_avg.avg
205.63 ± 27% -47.3% 108.41 ± 52% sched_debug.cfs_rq:/.removed.load_avg.stddev
0.13 ± 3% +13.8% 0.15 ± 4% turbostat.IPC
82.74 +1.5% 83.95 turbostat.PkgWatt
2954572 +19.5% 3529576 ± 4% stress-ng.io.ops
49242 +19.5% 58826 ± 4% stress-ng.io.ops_per_sec
151.67 -3.8% 145.86 stress-ng.time.system_time
27.02 +21.6% 32.86 ± 4% stress-ng.time.user_time
1.017e+09 +21.7% 1.238e+09 ± 3% perf-stat.i.branch-instructions
2.07 -0.4 1.71 ± 3% perf-stat.i.branch-miss-rate%
1.42e+08 +20.9% 1.717e+08 ± 2% perf-stat.i.cache-references
2.54 -17.6% 2.09 ± 4% perf-stat.i.cpi
0.13 -0.0 0.12 perf-stat.i.dTLB-load-miss-rate%
1359466 +19.1% 1618588 ± 4% perf-stat.i.dTLB-load-misses
1.134e+09 +19.6% 1.356e+09 ± 3% perf-stat.i.dTLB-loads
0.00 ± 7% -0.0 0.00 ± 3% perf-stat.i.dTLB-store-miss-rate%
5.421e+08 +19.6% 6.483e+08 ± 3% perf-stat.i.dTLB-stores
63.26 ± 4% +6.1 69.35 ± 2% perf-stat.i.iTLB-load-miss-rate%
5.08e+09 +20.7% 6.131e+09 ± 3% perf-stat.i.instructions
0.42 +19.3% 0.50 ± 3% perf-stat.i.ipc
78.71 +20.4% 94.79 ± 3% perf-stat.i.metric.M/sec
2.23 -0.4 1.85 ± 3% perf-stat.overall.branch-miss-rate%
0.33 ± 4% -0.1 0.28 ± 6% perf-stat.overall.cache-miss-rate%
2.44 -16.8% 2.03 ± 3% perf-stat.overall.cpi
0.00 ± 4% -0.0 0.00 ± 3% perf-stat.overall.dTLB-store-miss-rate%
62.97 ± 5% +7.3 70.29 ± 3% perf-stat.overall.iTLB-load-miss-rate%
0.41 +20.4% 0.49 ± 3% perf-stat.overall.ipc
1.001e+09 +21.7% 1.218e+09 ± 3% perf-stat.ps.branch-instructions
1.398e+08 +20.9% 1.69e+08 ± 2% perf-stat.ps.cache-references
1337922 +19.1% 1592892 ± 4% perf-stat.ps.dTLB-load-misses
1.116e+09 +19.6% 1.334e+09 ± 3% perf-stat.ps.dTLB-loads
5.335e+08 +19.6% 6.38e+08 ± 3% perf-stat.ps.dTLB-stores
4.999e+09 +20.7% 6.033e+09 ± 3% perf-stat.ps.instructions
3.167e+11 +20.4% 3.811e+11 ± 3% perf-stat.total.instructions
21.48 ± 3% -7.5 13.96 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
18.33 ± 3% -7.0 11.30 ± 14% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.iterate_supers.ksys_sync.__x64_sys_sync
35.16 ± 3% -6.9 28.21 ± 6% perf-profile.calltrace.cycles-pp.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.20 ± 3% -6.9 29.35 ± 6% perf-profile.calltrace.cycles-pp.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
36.20 ± 3% -6.9 29.35 ± 6% perf-profile.calltrace.cycles-pp.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
36.44 ± 3% -6.8 29.59 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
36.54 ± 3% -6.8 29.71 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sync
36.86 ± 3% -6.8 30.08 ± 6% perf-profile.calltrace.cycles-pp.sync
29.64 ± 8% -5.1 24.54 ± 15% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
29.64 ± 8% -5.1 24.54 ± 15% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
29.63 ± 8% -5.1 24.54 ± 15% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
29.45 ± 8% -5.1 24.37 ± 15% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
29.07 ± 8% -5.1 24.00 ± 15% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
29.78 ± 8% -4.1 25.66 ± 7% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
28.89 ± 8% -4.1 24.82 ± 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
26.19 ± 8% -4.0 22.20 ± 8% perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
9.73 ± 3% -1.8 7.94 ± 4% perf-profile.calltrace.cycles-pp.down_read.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
1.34 ± 5% +0.2 1.50 ± 4% perf-profile.calltrace.cycles-pp._find_next_bit.get_nr_inodes.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem
2.11 ± 5% +0.2 2.32 ± 6% perf-profile.calltrace.cycles-pp.__entry_text_start.syncfs
1.14 ± 7% +0.2 1.36 ± 8% perf-profile.calltrace.cycles-pp.up_read.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
2.48 ± 6% +0.4 2.84 ± 4% perf-profile.calltrace.cycles-pp.down_read.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
4.64 ± 5% +0.6 5.23 ± 5% perf-profile.calltrace.cycles-pp.get_nr_inodes.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs
4.99 ± 6% +0.6 5.58 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syncfs
4.33 ± 4% +0.6 4.97 ± 5% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
9.99 ± 5% +1.2 11.21 ± 4% perf-profile.calltrace.cycles-pp.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs.do_syscall_64
10.31 ± 5% +1.3 11.56 ± 4% perf-profile.calltrace.cycles-pp.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.3 1.35 ± 8% perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
0.00 +1.8 1.78 ± 8% perf-profile.calltrace.cycles-pp.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force
0.00 +2.6 2.64 ± 5% perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.iterate_supers
0.00 +2.7 2.66 ± 5% perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.iterate_supers.ksys_sync
0.00 +2.7 2.69 ± 5% perf-profile.calltrace.cycles-pp.xfs_log_force.xfs_fs_sync_fs.iterate_supers.ksys_sync.__x64_sys_sync
0.00 +2.7 2.70 ± 5% perf-profile.calltrace.cycles-pp.xfs_fs_sync_fs.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
0.00 +6.5 6.49 ± 7% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.flush_workqueue_prep_pwqs.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
0.66 ± 9% +7.0 7.63 ± 6% perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs
0.62 ± 10% +7.0 7.59 ± 6% perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.sync_filesystem
0.70 ± 9% +7.0 7.68 ± 6% perf-profile.calltrace.cycles-pp.xfs_log_force.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs.do_syscall_64
0.77 ± 8% +7.0 7.76 ± 6% perf-profile.calltrace.cycles-pp.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +7.5 7.46 ± 6% perf-profile.calltrace.cycles-pp.flush_workqueue_prep_pwqs.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force
12.60 ± 5% +8.5 21.12 ± 5% perf-profile.calltrace.cycles-pp.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
16.57 ± 5% +9.2 25.72 ± 4% perf-profile.calltrace.cycles-pp.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
0.50 ± 45% +9.7 10.18 ± 6% perf-profile.calltrace.cycles-pp.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs
21.43 ± 5% +9.8 31.22 ± 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
24.17 ± 5% +10.1 34.30 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syncfs
31.94 ± 5% +11.0 42.94 ± 4% perf-profile.calltrace.cycles-pp.syncfs
22.38 ± 3% -7.5 14.88 ± 11% perf-profile.children.cycles-pp._raw_spin_lock
18.34 ± 3% -7.0 11.30 ± 14% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
35.22 ± 3% -6.9 28.30 ± 6% perf-profile.children.cycles-pp.iterate_supers
36.20 ± 3% -6.9 29.35 ± 6% perf-profile.children.cycles-pp.__x64_sys_sync
36.20 ± 3% -6.9 29.35 ± 6% perf-profile.children.cycles-pp.ksys_sync
36.88 ± 3% -6.8 30.10 ± 6% perf-profile.children.cycles-pp.sync
29.64 ± 8% -5.1 24.54 ± 15% perf-profile.children.cycles-pp.start_secondary
29.78 ± 8% -4.1 25.66 ± 7% perf-profile.children.cycles-pp.do_idle
29.78 ± 8% -4.1 25.66 ± 7% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
29.78 ± 8% -4.1 25.66 ± 7% perf-profile.children.cycles-pp.cpu_startup_entry
29.20 ± 8% -4.1 25.10 ± 7% perf-profile.children.cycles-pp.cpuidle_enter
29.59 ± 8% -4.1 25.50 ± 7% perf-profile.children.cycles-pp.cpuidle_idle_call
29.19 ± 8% -4.1 25.10 ± 7% perf-profile.children.cycles-pp.cpuidle_enter_state
26.26 ± 8% -4.0 22.28 ± 8% perf-profile.children.cycles-pp.intel_idle_ibrs
12.26 ± 3% -1.4 10.84 ± 3% perf-profile.children.cycles-pp.down_read
1.89 ± 12% -0.3 1.62 ± 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.23 ± 7% -0.1 0.18 ± 11% perf-profile.children.cycles-pp.ktime_get
0.09 ± 10% +0.1 0.14 ± 11% perf-profile.children.cycles-pp.up_write
0.16 ± 14% +0.1 0.22 ± 10% perf-profile.children.cycles-pp.sync_fs_one_sb
0.36 ± 7% +0.1 0.44 ± 4% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.47 ± 7% +0.1 0.55 ± 5% perf-profile.children.cycles-pp.__fget_light
0.38 ± 11% +0.1 0.47 ± 6% perf-profile.children.cycles-pp.mutex_lock
0.46 ± 9% +0.1 0.56 ± 6% perf-profile.children.cycles-pp.__cond_resched
1.24 ± 5% +0.2 1.44 ± 5% perf-profile.children.cycles-pp.sync_inodes_sb
0.00 +0.2 0.22 ± 13% perf-profile.children.cycles-pp.osq_lock
0.44 ± 10% +0.2 0.66 ± 7% perf-profile.children.cycles-pp.mutex_unlock
2.51 ± 6% +0.3 2.77 ± 6% perf-profile.children.cycles-pp.__entry_text_start
1.46 ± 8% +0.3 1.81 ± 6% perf-profile.children.cycles-pp.up_read
4.66 ± 4% +0.6 5.28 ± 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
4.92 ± 5% +0.6 5.55 ± 4% perf-profile.children.cycles-pp.get_nr_inodes
10.29 ± 5% +1.2 11.54 ± 4% perf-profile.children.cycles-pp.get_nr_dirty_inodes
10.32 ± 5% +1.3 11.58 ± 4% perf-profile.children.cycles-pp.writeback_inodes_sb
0.00 +1.6 1.62 ± 7% perf-profile.children.cycles-pp.mutex_spin_on_owner
0.00 +2.1 2.12 ± 7% perf-profile.children.cycles-pp.__mutex_lock
0.25 ± 11% +6.4 6.63 ± 7% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.49 ± 9% +7.0 7.50 ± 6% perf-profile.children.cycles-pp.flush_workqueue_prep_pwqs
12.65 ± 5% +8.5 21.16 ± 5% perf-profile.children.cycles-pp.sync_filesystem
16.61 ± 5% +9.2 25.76 ± 4% perf-profile.children.cycles-pp.__x64_sys_syncfs
0.86 ± 10% +9.3 10.19 ± 6% perf-profile.children.cycles-pp.__flush_workqueue
0.91 ± 8% +9.3 10.24 ± 6% perf-profile.children.cycles-pp.xlog_cil_push_now
0.97 ± 8% +9.3 10.30 ± 6% perf-profile.children.cycles-pp.xlog_cil_force_seq
1.11 ± 8% +9.4 10.46 ± 6% perf-profile.children.cycles-pp.xfs_fs_sync_fs
1.02 ± 9% +9.4 10.38 ± 6% perf-profile.children.cycles-pp.xfs_log_force
32.30 ± 5% +11.1 43.37 ± 4% perf-profile.children.cycles-pp.syncfs
18.22 ± 3% -7.0 11.25 ± 14% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
26.26 ± 8% -4.0 22.27 ± 8% perf-profile.self.cycles-pp.intel_idle_ibrs
11.86 ± 3% -1.5 10.36 ± 3% perf-profile.self.cycles-pp.down_read
3.99 ± 2% -0.5 3.53 ± 4% perf-profile.self.cycles-pp._raw_spin_lock
1.62 ± 6% -0.4 1.27 ± 9% perf-profile.self.cycles-pp.iterate_supers
0.12 ± 12% -0.0 0.09 ± 13% perf-profile.self.cycles-pp.ktime_get
0.02 ±141% +0.0 0.06 ± 13% perf-profile.self.cycles-pp.writeback_inodes_sb
0.08 ± 13% +0.1 0.14 ± 12% perf-profile.self.cycles-pp.up_write
0.16 ± 14% +0.1 0.22 ± 10% perf-profile.self.cycles-pp.sync_fs_one_sb
0.39 ± 7% +0.1 0.46 ± 7% perf-profile.self.cycles-pp.syncfs
0.31 ± 8% +0.1 0.38 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.46 ± 7% +0.1 0.54 ± 6% perf-profile.self.cycles-pp.__fget_light
0.34 ± 10% +0.1 0.43 ± 6% perf-profile.self.cycles-pp.mutex_lock
0.30 ± 13% +0.1 0.39 ± 7% perf-profile.self.cycles-pp.__cond_resched
0.32 ± 7% +0.1 0.43 ± 6% perf-profile.self.cycles-pp.sync_filesystem
0.00 +0.2 0.22 ± 13% perf-profile.self.cycles-pp.osq_lock
0.43 ± 10% +0.2 0.65 ± 7% perf-profile.self.cycles-pp.mutex_unlock
0.00 +0.2 0.24 ± 8% perf-profile.self.cycles-pp.__mutex_lock
1.40 ± 7% +0.3 1.72 ± 6% perf-profile.self.cycles-pp.up_read
2.99 ± 5% +0.4 3.36 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
4.21 ± 5% +0.5 4.71 ± 5% perf-profile.self.cycles-pp.get_nr_dirty_inodes
3.82 ± 4% +0.5 4.33 ± 5% perf-profile.self.cycles-pp.get_nr_inodes
4.48 ± 3% +0.6 5.11 ± 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.26 ± 6% +0.6 0.91 ± 7% perf-profile.self.cycles-pp.flush_workqueue_prep_pwqs
0.00 +1.6 1.61 ± 7% perf-profile.self.cycles-pp.mutex_spin_on_owner
0.24 ± 11% +6.3 6.58 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irq




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki