[linus:master] [mm] 24526268f4: stress-ng.numa.ops_per_sec 4.7% improvement

From: kernel test robot
Date: Sat Oct 07 2023 - 03:09:16 EST




Hello,

kernel test robot noticed a 4.7% improvement of stress-ng.numa.ops_per_sec on:


commit: 24526268f4e38c9ec0c4a30de4f37ad2a2a84e47 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:

nr_threads: 1
testtime: 60s
class: cpu
test: numa
cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.numa.ops_per_sec 4.5% improvement |
| test machine | 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory |
| test parameters | class=os |
| | cpufreq_governor=performance |
| | disk=1HDD |
| | fs=ext4 |
| | nr_threads=1 |
| | test=numa |
| | testtime=60s |
+------------------+-------------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231007/202310071416.df82eed7-oliver.sang@xxxxxxxxx

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
cpu/gcc-12/performance/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/numa/stress-ng/60s

commit:
45120b1574 ("mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()")
24526268f4 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified")

45120b15743fa7c0 24526268f4e38c9ec0c4a30de4f
---------------- ---------------------------
%stddev %change %stddev
\ | \
272.18 ± 77% -99.9% 0.31 ±220% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
1089 +4.7% 1141 stress-ng.numa.ops
18.16 +4.7% 19.01 stress-ng.numa.ops_per_sec
20387 +5.2% 21456 stress-ng.time.involuntary_context_switches
2.173e+09 +3.6% 2.251e+09 perf-stat.i.branch-instructions
0.50 -3.5% 0.48 perf-stat.i.cpi
1.865e+09 +3.6% 1.932e+09 perf-stat.i.dTLB-loads
1.06e+10 +3.4% 1.096e+10 perf-stat.i.instructions
2.02 +3.8% 2.10 perf-stat.i.ipc
130.34 +3.1% 134.39 perf-stat.i.metric.M/sec
0.50 -3.6% 0.49 perf-stat.overall.cpi
1.99 +3.7% 2.06 perf-stat.overall.ipc
2.139e+09 +3.6% 2.216e+09 perf-stat.ps.branch-instructions
1.836e+09 +3.6% 1.901e+09 perf-stat.ps.dTLB-loads
1.043e+10 +3.4% 1.079e+10 perf-stat.ps.instructions
6.597e+11 +3.4% 6.822e+11 perf-stat.total.instructions
17.43 ± 5% -1.9 15.50 ± 2% perf-profile.calltrace.cycles-pp.queue_folios_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
18.49 ± 4% -1.9 16.61 ± 2% perf-profile.calltrace.cycles-pp.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range
19.07 ± 4% -1.8 17.25 ± 2% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
19.67 ± 4% -1.8 17.86 ± 2% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.migrate_to_node
3.76 ± 4% -0.4 3.33 ± 9% perf-profile.calltrace.cycles-pp.mt_find.find_vma.queue_pages_test_walk.walk_page_range.migrate_to_node
3.94 ± 4% -0.4 3.53 ± 8% perf-profile.calltrace.cycles-pp.find_vma.queue_pages_test_walk.walk_page_range.migrate_to_node.do_migrate_pages
17.60 ± 4% -1.9 15.71 ± 2% perf-profile.children.cycles-pp.queue_folios_pte_range
18.50 ± 4% -1.9 16.63 ± 2% perf-profile.children.cycles-pp.walk_pmd_range
19.11 ± 4% -1.8 17.29 ± 2% perf-profile.children.cycles-pp.walk_pud_range
19.69 ± 4% -1.8 17.88 ± 2% perf-profile.children.cycles-pp.walk_p4d_range
20.79 ± 4% -1.8 19.02 ± 3% perf-profile.children.cycles-pp.__walk_page_range
0.08 ± 19% +0.1 0.15 ± 17% perf-profile.children.cycles-pp.rcu_all_qs
0.27 ± 9% +0.1 0.35 ± 13% perf-profile.children.cycles-pp.__cond_resched
11.70 ± 6% -1.9 9.84 perf-profile.self.cycles-pp.queue_folios_pte_range
2.01 ± 10% -0.3 1.72 ± 6% perf-profile.self.cycles-pp.vm_normal_folio
0.14 ± 20% +0.1 0.22 ± 16% perf-profile.self.cycles-pp.__cond_resched


***************************************************************************************************
lkp-csl-d02: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/numa/stress-ng/60s

commit:
45120b1574 ("mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()")
24526268f4 ("mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified")

45120b15743fa7c0 24526268f4e38c9ec0c4a30de4f
---------------- ---------------------------
%stddev %change %stddev
\ | \
1023 ± 22% -42.3% 590.75 ± 35% sched_debug.cpu.nr_switches.min
1096 +4.5% 1145 stress-ng.numa.ops
18.26 +4.5% 19.08 stress-ng.numa.ops_per_sec
20712 ± 2% +4.6% 21663 stress-ng.time.involuntary_context_switches
6.57 ± 17% -1.4 5.17 ± 12% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
5.55 ± 15% -1.0 4.55 ± 9% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
4.37 ± 17% -0.8 3.60 ± 8% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
4.32 ± 17% -0.7 3.57 ± 8% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
2.54 ± 17% -0.5 2.08 ± 10% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.20 ± 28% -0.1 0.13 ± 27% perf-profile.children.cycles-pp.irqtime_account_irq
0.13 ± 19% +0.1 0.20 ± 24% perf-profile.children.cycles-pp.hrtimer_start_range_ns
2.068e+09 +3.7% 2.143e+09 perf-stat.i.branch-instructions
0.55 -0.0 0.52 perf-stat.i.branch-miss-rate%
12019422 -4.1% 11526701 perf-stat.i.branch-misses
0.50 -3.5% 0.48 perf-stat.i.cpi
1.767e+09 +3.6% 1.83e+09 perf-stat.i.dTLB-loads
1.009e+10 +3.5% 1.044e+10 perf-stat.i.instructions
19534 +2.4% 20010 perf-stat.i.instructions-per-iTLB-miss
2.03 +3.7% 2.11 perf-stat.i.ipc
123.98 +3.1% 127.81 perf-stat.i.metric.M/sec
0.58 -0.0 0.54 perf-stat.overall.branch-miss-rate%
0.49 -3.6% 0.48 perf-stat.overall.cpi
17843 +2.3% 18252 perf-stat.overall.instructions-per-iTLB-miss
2.02 +3.7% 2.10 perf-stat.overall.ipc
2.035e+09 +3.7% 2.11e+09 perf-stat.ps.branch-instructions
11834693 -4.1% 11344043 perf-stat.ps.branch-misses
1.739e+09 +3.6% 1.801e+09 perf-stat.ps.dTLB-loads
497472 +1.6% 505490 perf-stat.ps.iTLB-loads
9.932e+09 +3.5% 1.028e+10 perf-stat.ps.instructions
6.277e+11 +3.7% 6.512e+11 perf-stat.total.instructions





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki