Re: [PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.

From: kernel test robot
Date: Mon Jan 08 2024 - 02:42:33 EST




Hello,

kernel test robot noticed a 10.7% improvement of stress-ng.netlink-task.ops_per_sec on:


commit: d93300891f810c9498d09a6ecea2403d7a3546f0 ("[PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.")
url: https://github.com/intel-lab-lkp/linux/commits/David-Laight/locking-osq_lock-Defer-clearing-node-locked-until-the-slow-osq_lock-path/20240101-055853
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 610a9b8f49fbcf1100716370d3b5f6f884a2835a
patch link: https://lore.kernel.org/all/3a9d1782cd50436c99ced8c10175bae6@xxxxxxxxxxxxxxxx/
patch subject: [PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
sc_pid_max: 4194304
class: scheduler
test: netlink-task
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240108/202401081557.641738f5-oliver.sang@xxxxxxxxx

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/sc_pid_max/tbox_group/test/testcase/testtime:
scheduler/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/4194304/lkp-icl-2sp8/netlink-task/stress-ng/60s

commit:
ff787c1fd0 ("locking/osq_lock: Defer clearing node->locked until the slow osq_lock() path.")
d93300891f ("locking/osq_lock: Optimise the vcpu_is_preempted() check.")

ff787c1fd0c13f9b d93300891f810c9498d09a6ecea
---------------- ---------------------------
%stddev %change %stddev
\ | \
3880 ± 7% +26.4% 4903 ± 18% vmstat.system.cs
0.48 ±126% -99.8% 0.00 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
0.16 ± 23% -38.9% 0.10 ± 32% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.genl_rcv_msg
1.50 ±118% -99.9% 0.00 ±142% perf-sched.sch_delay.max.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom
2.55 ± 97% -83.7% 0.42 ±145% perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.genl_rcv_msg
32244865 +10.7% 35709040 stress-ng.netlink-task.ops
537413 +10.7% 595150 stress-ng.netlink-task.ops_per_sec
38094 ± 12% +42.2% 54160 ± 27% stress-ng.time.involuntary_context_switches
42290 ± 11% +39.8% 59117 ± 23% stress-ng.time.voluntary_context_switches
143.50 ± 7% -28.8% 102.17 ± 15% perf-c2c.DRAM.local
4955 ± 3% -26.4% 3647 ± 4% perf-c2c.DRAM.remote
4038 ± 2% -18.8% 3277 ± 4% perf-c2c.HITM.local
3483 ± 3% -21.1% 2747 ± 5% perf-c2c.HITM.remote
7521 ± 2% -19.9% 6024 ± 4% perf-c2c.HITM.total
0.42 ± 3% -16.2% 0.35 ± 5% perf-stat.i.MPKI
1.066e+10 +9.6% 1.169e+10 perf-stat.i.branch-instructions
51.90 -2.5 49.42 ± 2% perf-stat.i.cache-miss-rate%
22517746 ± 3% -13.4% 19503564 ± 5% perf-stat.i.cache-misses
3730 ± 7% +29.2% 4819 ± 19% perf-stat.i.context-switches
3.99 -3.1% 3.86 perf-stat.i.cpi
9535 ± 3% +15.4% 11003 ± 5% perf-stat.i.cycles-between-cache-misses
0.00 ± 3% +0.0 0.00 ± 3% perf-stat.i.dTLB-load-miss-rate%
1.419e+10 -14.9% 1.207e+10 perf-stat.i.dTLB-loads
8.411e+08 +8.4% 9.118e+08 perf-stat.i.dTLB-stores
5.36e+10 +3.1% 5.524e+10 perf-stat.i.instructions
0.26 +7.0% 0.28 ± 5% perf-stat.i.ipc
837.29 ± 3% -9.8% 755.30 ± 4% perf-stat.i.metric.K/sec
401.41 -4.1% 385.10 perf-stat.i.metric.M/sec
6404966 -23.2% 4920722 perf-stat.i.node-load-misses
141818 ± 4% -22.2% 110404 ± 4% perf-stat.i.node-loads
69.54 +13.8 83.36 perf-stat.i.node-store-miss-rate%
3935319 +10.4% 4345724 perf-stat.i.node-store-misses
1626033 -52.6% 771187 ± 5% perf-stat.i.node-stores
0.42 ± 3% -16.0% 0.35 ± 5% perf-stat.overall.MPKI
0.11 -0.0 0.10 ± 8% perf-stat.overall.branch-miss-rate%
51.32 -2.5 48.79 ± 2% perf-stat.overall.cache-miss-rate%
4.06 -3.0% 3.94 perf-stat.overall.cpi
9677 ± 3% +15.6% 11187 ± 5% perf-stat.overall.cycles-between-cache-misses
0.00 ± 3% +0.0 0.00 ± 4% perf-stat.overall.dTLB-load-miss-rate%
0.25 +3.1% 0.25 perf-stat.overall.ipc
70.78 +14.2 84.94 perf-stat.overall.node-store-miss-rate%
1.049e+10 +9.5% 1.149e+10 perf-stat.ps.branch-instructions
22167740 ± 3% -13.4% 19186498 ± 5% perf-stat.ps.cache-misses
3667 ± 7% +29.1% 4735 ± 19% perf-stat.ps.context-switches
1.396e+10 -15.0% 1.187e+10 perf-stat.ps.dTLB-loads
8.273e+08 +8.3% 8.963e+08 perf-stat.ps.dTLB-stores
5.274e+10 +3.0% 5.433e+10 perf-stat.ps.instructions
6303682 -23.2% 4839978 perf-stat.ps.node-load-misses
140690 ± 4% -22.5% 109023 ± 4% perf-stat.ps.node-loads
3875362 +10.3% 4276026 perf-stat.ps.node-store-misses
1599985 -52.6% 758184 ± 5% perf-stat.ps.node-stores
3.297e+12 +3.0% 3.396e+12 perf-stat.total.instructions
96.07 -0.2 95.87 perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.genl_rcv_msg.netlink_rcv_skb.genl_rcv
97.52 -0.1 97.37 perf-profile.calltrace.cycles-pp.__mutex_lock.genl_rcv_msg.netlink_rcv_skb.genl_rcv.netlink_unicast
98.98 -0.1 98.90 perf-profile.calltrace.cycles-pp.netlink_rcv_skb.genl_rcv.netlink_unicast.netlink_sendmsg.__sys_sendto
98.99 -0.1 98.92 perf-profile.calltrace.cycles-pp.genl_rcv.netlink_unicast.netlink_sendmsg.__sys_sendto.__x64_sys_sendto
98.97 -0.1 98.89 perf-profile.calltrace.cycles-pp.genl_rcv_msg.netlink_rcv_skb.genl_rcv.netlink_unicast.netlink_sendmsg
99.09 -0.1 99.04 perf-profile.calltrace.cycles-pp.netlink_unicast.netlink_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
99.47 -0.0 99.43 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.stress_netlink_taskstats_monitor.stress_netlink_task
99.44 -0.0 99.40 perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.stress_netlink_taskstats_monitor
99.35 -0.0 99.32 perf-profile.calltrace.cycles-pp.netlink_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.44 -0.0 99.40 perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto
96.08 -0.2 95.89 perf-profile.children.cycles-pp.osq_lock
97.52 -0.1 97.38 perf-profile.children.cycles-pp.__mutex_lock
98.98 -0.1 98.90 perf-profile.children.cycles-pp.netlink_rcv_skb
99.00 -0.1 98.92 perf-profile.children.cycles-pp.genl_rcv
98.97 -0.1 98.89 perf-profile.children.cycles-pp.genl_rcv_msg
99.20 -0.0 99.15 perf-profile.children.cycles-pp.netlink_unicast
0.13 ± 3% -0.0 0.08 ± 7% perf-profile.children.cycles-pp.genl_cmd_full_to_split
0.14 ± 4% -0.0 0.10 ± 5% perf-profile.children.cycles-pp.genl_get_cmd
99.36 -0.0 99.32 perf-profile.children.cycles-pp.netlink_sendmsg
99.44 -0.0 99.41 perf-profile.children.cycles-pp.__x64_sys_sendto
99.44 -0.0 99.41 perf-profile.children.cycles-pp.__sys_sendto
99.59 -0.0 99.56 perf-profile.children.cycles-pp.sendto
0.07 ± 5% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.genl_family_rcv_msg_attrs_parse
0.11 +0.0 0.12 ± 6% perf-profile.children.cycles-pp.apparmor_capable
0.18 ± 3% +0.0 0.20 ± 4% perf-profile.children.cycles-pp.netlink_recvmsg
0.36 +0.0 0.38 perf-profile.children.cycles-pp.fill_stats
0.13 ± 3% +0.0 0.15 ± 4% perf-profile.children.cycles-pp.ns_capable
0.20 ± 3% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.sock_recvmsg
0.24 ± 3% +0.0 0.27 ± 3% perf-profile.children.cycles-pp.__sys_recvfrom
0.24 ± 3% +0.0 0.27 ± 4% perf-profile.children.cycles-pp.__x64_sys_recvfrom
0.31 ± 3% +0.0 0.34 ± 3% perf-profile.children.cycles-pp.recv
1.22 +0.0 1.26 perf-profile.children.cycles-pp.genl_family_rcv_msg
0.85 +0.1 0.90 perf-profile.children.cycles-pp.cmd_attr_pid
0.94 +0.1 1.01 perf-profile.children.cycles-pp.genl_family_rcv_msg_doit
1.11 +0.1 1.23 perf-profile.children.cycles-pp.mutex_spin_on_owner
95.80 -0.2 95.62 perf-profile.self.cycles-pp.osq_lock
0.13 ± 3% -0.0 0.08 ± 7% perf-profile.self.cycles-pp.genl_cmd_full_to_split
0.11 ± 3% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.apparmor_capable
1.11 +0.1 1.23 perf-profile.self.cycles-pp.mutex_spin_on_owner




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki