[linus:master] [maple_tree] 4249f13c11: aim9.page_test.ops_per_sec 3.5% improvement

From: kernel test robot
Date: Tue Jan 09 2024 - 09:05:43 EST




Hello,

kernel test robot noticed a 3.5% improvement of aim9.page_test.ops_per_sec on:


commit: 4249f13c11be8b8b7bf93204185e150c3bdc968d ("maple_tree: do not preallocate nodes for slot stores")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

testtime: 300s
test: page_test
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240109/202401091651.a189376-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/page_test/aim9/300s

commit:
e2c27b803b ("mm/filemap: avoid buffered read/write race to read inconsistent data")
4249f13c11 ("maple_tree: do not preallocate nodes for slot stores")

e2c27b803bb66474 4249f13c11be8b8b7bf93204185
---------------- ---------------------------
%stddev %change %stddev
\ | \
336518 +3.5% 348367 aim9.page_test.ops_per_sec
95019000 +3.5% 98364469 aim9.time.minor_page_faults
25318 +2.3% 25903 proc-vmstat.nr_active_anon
26605 +2.2% 27197 proc-vmstat.nr_shmem
25318 +2.3% 25903 proc-vmstat.nr_zone_active_anon
1.087e+08 +3.3% 1.122e+08 proc-vmstat.numa_hit
1.085e+08 +3.4% 1.121e+08 proc-vmstat.numa_local
1.079e+08 +3.5% 1.117e+08 proc-vmstat.pgalloc_normal
95763046 +3.5% 99109694 proc-vmstat.pgfault
1.078e+08 +3.5% 1.116e+08 proc-vmstat.pgfree
56340620 +1.4% 57128415 perf-stat.i.cache-references
3744535 -7.4% 3468589 perf-stat.i.iTLB-load-misses
923.85 +8.2% 999.87 perf-stat.i.instructions-per-iTLB-miss
318120 +3.5% 329244 perf-stat.i.minor-faults
318120 +3.5% 329244 perf-stat.i.page-faults
12.48 -0.2 12.32 perf-stat.overall.cache-miss-rate%
911.69 +8.5% 988.95 perf-stat.overall.instructions-per-iTLB-miss
56153225 +1.4% 56938073 perf-stat.ps.cache-references
3731915 -7.4% 3456934 perf-stat.ps.iTLB-load-misses
317046 +3.5% 328134 perf-stat.ps.minor-faults
317046 +3.5% 328134 perf-stat.ps.page-faults
1.54 ± 15% -0.9 0.61 ± 35% perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.56 ± 16% -0.9 0.67 ± 18% perf-profile.children.cycles-pp.mas_preallocate
0.59 ± 18% -0.5 0.06 ± 66% perf-profile.children.cycles-pp.mas_destroy
0.03 ± 84% +0.1 0.13 ± 26% perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
0.18 ± 27% +0.2 0.42 ± 15% perf-profile.children.cycles-pp.vma_adjust_trans_huge
0.28 ± 12% +0.3 0.57 ± 14% perf-profile.children.cycles-pp.vma_complete
0.20 ± 28% -0.1 0.13 ± 24% perf-profile.self.cycles-pp.security_mmap_addr
0.16 ± 23% -0.1 0.10 ± 17% perf-profile.self.cycles-pp.__perf_sw_event
0.17 ± 18% +0.1 0.27 ± 30% perf-profile.self.cycles-pp.get_vma_policy
0.02 ±118% +0.1 0.13 ± 26% perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
0.08 ± 25% +0.2 0.24 ± 13% perf-profile.self.cycles-pp.vma_complete
0.18 ± 28% +0.2 0.42 ± 15% perf-profile.self.cycles-pp.vma_adjust_trans_huge




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki