[tip:sched/core] [sched/numa] f169c62ff7: autonuma-benchmark.numa01.seconds -32.0% improvement

From: kernel test robot
Date: Wed Oct 18 2023 - 10:51:13 EST




Hello,

kernel test robot noticed a -32.0% improvement of autonuma-benchmark.numa01.seconds on:


commit: f169c62ff7cd1acf8bac8ae17bfeafa307d9e6fa ("sched/numa: Complete scanning of inactive VMAs when there is no alternative")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core

testcase: autonuma-benchmark
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:

iterations: 4x
test: _HARD_BIND
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231018/202310182223.6ef26fcb-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/_HARD_BIND/autonuma-benchmark

commit:
b7a5b537c5 ("sched/numa: Complete scanning of partial VMAs regardless of PID activity")
f169c62ff7 ("sched/numa: Complete scanning of inactive VMAs when there is no alternative")

b7a5b537c55c088d f169c62ff7cd1acf8bac8ae17bf
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.00 ± 58% +3290.8% 0.06 ± 30% vmstat.procs.b
2.101e+10 ± 6% -47.7% 1.099e+10 ± 5% cpuidle..time
21679574 ± 6% -47.8% 11321089 ± 5% cpuidle..usage
2441 -25.2% 1826 ± 2% uptime.boot
25666 ± 5% -40.3% 15332 ± 3% uptime.idle
208968 ± 8% -11.6% 184676 ± 3% meminfo.Active
208280 ± 8% -11.7% 183990 ± 3% meminfo.Active(anon)
221680 ± 8% -11.7% 195741 ± 2% meminfo.Shmem
3154708 ± 7% -12.1% 2773727 ± 4% numa-numastat.node0.local_node
3621021 ± 4% -12.9% 3154008 ± 3% numa-numastat.node0.numa_hit
3424075 ± 5% -19.2% 2767342 ± 3% numa-numastat.node1.local_node
3659250 ± 4% -15.8% 3079449 ± 3% numa-numastat.node1.numa_hit
3620012 ± 4% -12.9% 3153670 ± 3% numa-vmstat.node0.numa_hit
3153693 ± 7% -12.1% 2773388 ± 4% numa-vmstat.node0.numa_local
3657708 ± 4% -15.8% 3078660 ± 3% numa-vmstat.node1.numa_hit
3422533 ± 5% -19.2% 2766552 ± 3% numa-vmstat.node1.numa_local
0.03 ± 31% +840.9% 0.29 ±179% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
2.34 ± 49% -83.7% 0.38 ±194% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.setup_arg_pages.load_elf_binary.search_binary_handler
948.50 ± 16% -30.3% 661.33 ± 20% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
1142 ± 9% -24.3% 865.00 ± 9% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
9.20 ± 6% -2.8 6.43 ± 5% mpstat.cpu.all.idle%
0.00 ± 14% +0.0 0.03 ± 28% mpstat.cpu.all.iowait%
2.25 -0.5 1.73 mpstat.cpu.all.irq%
0.08 ± 3% -0.0 0.06 ± 2% mpstat.cpu.all.soft%
2.18 ± 4% +0.5 2.69 ± 2% mpstat.cpu.all.sys%
474.78 -32.0% 322.76 ± 2% autonuma-benchmark.numa01.seconds
2386 -25.7% 1774 ± 2% autonuma-benchmark.time.elapsed_time
2386 -25.7% 1774 ± 2% autonuma-benchmark.time.elapsed_time.max
1298583 ± 2% -24.3% 983395 ± 3% autonuma-benchmark.time.involuntary_context_switches
3120169 -6.0% 2932947 autonuma-benchmark.time.minor_page_faults
8443 +3.9% 8771 autonuma-benchmark.time.percent_of_cpu_this_job_got
197252 -23.2% 151486 ± 2% autonuma-benchmark.time.user_time
26055 +45.4% 37896 ± 8% autonuma-benchmark.time.voluntary_context_switches
42526 ± 9% -15.6% 35892 ± 10% turbostat.C1
0.03 +0.0 0.05 ± 20% turbostat.C1E%
21430219 ± 6% -48.2% 11099622 ± 5% turbostat.C6
9.14 ± 6% -2.7 6.40 ± 5% turbostat.C6%
8.80 ± 6% -31.1% 6.06 ± 5% turbostat.CPU%c1
2.31e+08 -24.9% 1.735e+08 ± 2% turbostat.IRQ
37846 ± 7% -29.3% 26768 ± 9% turbostat.POLL
283.70 +3.1% 292.44 turbostat.PkgWatt
63.32 +16.3% 73.65 turbostat.RAMWatt
52079 ± 8% -11.6% 46034 ± 3% proc-vmstat.nr_active_anon
1560738 -1.8% 1532470 proc-vmstat.nr_anon_pages
3000 -2.2% 2934 proc-vmstat.nr_anon_transparent_hugepages
754040 -0.9% 747564 proc-vmstat.nr_file_pages
1564055 -1.8% 1535343 proc-vmstat.nr_inactive_anon
3964 -1.4% 3911 proc-vmstat.nr_page_table_pages
55430 ± 8% -11.7% 48954 ± 2% proc-vmstat.nr_shmem
52079 ± 8% -11.6% 46034 ± 3% proc-vmstat.nr_zone_active_anon
1564055 -1.8% 1535343 proc-vmstat.nr_zone_inactive_anon
31021 ± 4% +356.7% 141673 ± 3% proc-vmstat.numa_hint_faults
26070 ± 14% +232.5% 86687 ± 5% proc-vmstat.numa_hint_faults_local
7281579 -14.4% 6235148 proc-vmstat.numa_hit
2255 ± 9% +3726.0% 86295 ± 3% proc-vmstat.numa_huge_pte_updates
6580091 -15.8% 5542759 proc-vmstat.numa_local
490705 +1551.4% 8103581 ± 3% proc-vmstat.numa_pages_migrated
1256422 ± 9% +3426.6% 44309531 ± 3% proc-vmstat.numa_pte_updates
8560551 -16.7% 7127497 proc-vmstat.pgfault
490705 +1551.4% 8103581 ± 3% proc-vmstat.pgmigrate_success
370032 ± 2% -19.7% 296973 ± 2% proc-vmstat.pgreuse
951.67 +1560.5% 15802 ± 3% proc-vmstat.thp_migration_success
17842432 -25.5% 13288960 ± 2% proc-vmstat.unevictable_pgs_scanned
1.408e+08 ± 2% -26.2% 1.04e+08 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg
1.476e+08 ± 2% -27.3% 1.073e+08 ± 2% sched_debug.cfs_rq:/.avg_vruntime.max
1.252e+08 -24.9% 94038602 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min
4459866 ± 7% -40.1% 2672703 ± 3% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.22 ± 4% -10.8% 0.20 ± 3% sched_debug.cfs_rq:/.h_nr_running.stddev
1.408e+08 ± 2% -26.2% 1.04e+08 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg
1.476e+08 ± 2% -27.3% 1.073e+08 ± 2% sched_debug.cfs_rq:/.min_vruntime.max
1.252e+08 -24.9% 94038602 ± 3% sched_debug.cfs_rq:/.min_vruntime.min
4459866 ± 7% -40.1% 2672703 ± 3% sched_debug.cfs_rq:/.min_vruntime.stddev
131.57 ± 11% -17.7% 108.34 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
6.43 ± 26% +4528.1% 297.67 ± 3% sched_debug.cfs_rq:/.util_est_enqueued.avg
289.38 ± 9% +195.6% 855.43 ± 4% sched_debug.cfs_rq:/.util_est_enqueued.max
37.21 ± 11% +508.1% 226.25 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.stddev
827702 ± 3% -12.7% 722480 ± 6% sched_debug.cpu.avg_idle.min
306806 ± 5% -18.8% 249093 ± 5% sched_debug.cpu.avg_idle.stddev
1219552 ± 2% -25.7% 906296 ± 2% sched_debug.cpu.clock.avg
1219905 ± 2% -25.7% 906607 ± 2% sched_debug.cpu.clock.max
1219166 ± 2% -25.7% 905967 ± 2% sched_debug.cpu.clock.min
212.97 ± 6% -13.3% 184.61 ± 11% sched_debug.cpu.clock.stddev
1191500 ± 2% -25.3% 890437 ± 2% sched_debug.cpu.clock_task.avg
1199519 ± 2% -25.6% 892331 ± 2% sched_debug.cpu.clock_task.max
1176815 ± 2% -25.1% 881433 ± 2% sched_debug.cpu.clock_task.min
7162 ± 46% -77.7% 1593 ± 19% sched_debug.cpu.clock_task.stddev
28968 ± 4% -16.3% 24235 ± 6% sched_debug.cpu.curr->pid.avg
36717 -20.7% 29133 ± 2% sched_debug.cpu.curr->pid.max
3810 ± 11% -34.7% 2487 ± 11% sched_debug.cpu.curr->pid.stddev
1074237 -17.0% 892137 ± 6% sched_debug.cpu.max_idle_balance_cost.max
137289 ± 3% -43.4% 77712 ± 11% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 6% -13.2% 0.00 ± 10% sched_debug.cpu.next_balance.stddev
0.23 ± 3% -10.5% 0.20 ± 4% sched_debug.cpu.nr_running.stddev
22190 ± 2% -23.8% 16898 ± 3% sched_debug.cpu.nr_switches.avg
9201 ± 4% -15.7% 7756 ± 6% sched_debug.cpu.nr_switches.min
10098 ± 3% -15.0% 8588 ± 7% sched_debug.cpu.nr_switches.stddev
7.02 ± 10% +43.9% 10.10 ± 12% sched_debug.cpu.nr_uninterruptible.stddev
1219153 ± 2% -25.7% 905956 ± 2% sched_debug.cpu_clk
1218584 ± 2% -25.7% 905387 ± 2% sched_debug.ktime
1219711 ± 2% -25.7% 906506 ± 2% sched_debug.sched_clk
26.19 +28.8% 33.74 perf-stat.i.MPKI
1.144e+08 +4.0% 1.19e+08 perf-stat.i.branch-instructions
1.46 -0.1 1.37 perf-stat.i.branch-miss-rate%
63.10 +3.1 66.21 perf-stat.i.cache-miss-rate%
14186186 +29.8% 18412463 perf-stat.i.cache-misses
22674336 +24.7% 28269331 perf-stat.i.cache-references
502.02 +2.0% 512.29 perf-stat.i.cpi
2.643e+11 +3.9% 2.745e+11 perf-stat.i.cpu-cycles
126.73 +7.2% 135.83 perf-stat.i.cpu-migrations
21465 -23.4% 16442 perf-stat.i.cycles-between-cache-misses
1.572e+08 +3.2% 1.623e+08 perf-stat.i.dTLB-loads
0.27 +0.0 0.29 perf-stat.i.dTLB-store-miss-rate%
223732 +9.2% 244255 perf-stat.i.dTLB-store-misses
88029014 +3.0% 90626204 perf-stat.i.dTLB-stores
80.17 +1.7 81.85 perf-stat.i.iTLB-load-miss-rate%
314646 -16.1% 263974 ± 5% perf-stat.i.iTLB-loads
5.846e+08 +3.7% 6.063e+08 perf-stat.i.instructions
1246 ± 2% +13.7% 1416 ± 4% perf-stat.i.instructions-per-iTLB-miss
2.75 +3.9% 2.86 perf-stat.i.metric.GHz
2.57 +9.1% 2.81 perf-stat.i.metric.M/sec
3527 +10.9% 3913 perf-stat.i.minor-faults
233066 +31.7% 306835 perf-stat.i.node-load-misses
238123 +13.9% 271245 perf-stat.i.node-loads
3112998 +44.6% 4502856 ± 2% perf-stat.i.node-store-misses
9385414 +24.5% 11686725 ± 2% perf-stat.i.node-stores
3527 +10.9% 3913 perf-stat.i.page-faults
24.08 +25.2% 30.16 perf-stat.overall.MPKI
1.54 -0.1 1.49 perf-stat.overall.branch-miss-rate%
63.01 +2.4 65.42 perf-stat.overall.cache-miss-rate%
19050 -20.6% 15118 perf-stat.overall.cycles-between-cache-misses
0.25 +0.0 0.27 perf-stat.overall.dTLB-store-miss-rate%
62.66 +2.2 64.86 perf-stat.overall.iTLB-load-miss-rate%
1161 ± 4% +10.8% 1286 ± 4% perf-stat.overall.instructions-per-iTLB-miss
49.19 +3.7 52.85 perf-stat.overall.node-load-miss-rate%
26.10 +2.6 28.67 ± 2% perf-stat.overall.node-store-miss-rate%
1.134e+08 +4.5% 1.185e+08 perf-stat.ps.branch-instructions
13962667 +30.4% 18209922 perf-stat.ps.cache-misses
22159412 +25.6% 27835158 perf-stat.ps.cache-references
2.66e+11 +3.5% 2.752e+11 perf-stat.ps.cpu-cycles
126.06 +7.4% 135.36 perf-stat.ps.cpu-migrations
1.56e+08 +3.6% 1.617e+08 perf-stat.ps.dTLB-loads
222631 +9.1% 242846 perf-stat.ps.dTLB-store-misses
87353212 +3.2% 90174337 perf-stat.ps.dTLB-stores
297718 ± 2% -14.4% 254772 ± 5% perf-stat.ps.iTLB-loads
5.797e+08 +4.1% 6.038e+08 perf-stat.ps.instructions
3454 +11.6% 3856 perf-stat.ps.minor-faults
231688 +32.7% 307530 perf-stat.ps.node-load-misses
239364 +14.6% 274365 ± 2% perf-stat.ps.node-loads
3208117 +42.8% 4581795 ± 2% perf-stat.ps.node-store-misses
9084897 +25.5% 11404755 ± 2% perf-stat.ps.node-stores
3455 +11.6% 3856 perf-stat.ps.page-faults
1.384e+12 -22.6% 1.072e+12 perf-stat.total.instructions
45.09 ± 10% -30.4 14.72 ± 55% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
4.07 ± 30% -1.9 2.14 ± 40% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
4.07 ± 30% -1.9 2.14 ± 40% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
4.07 ± 30% -1.9 2.14 ± 40% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
1.74 ± 33% -1.2 0.57 ± 72% perf-profile.calltrace.cycles-pp.hpage_collapse_scan_pmd.khugepaged_scan_mm_slot.khugepaged.kthread.ret_from_fork
1.75 ± 33% -1.2 0.58 ± 73% perf-profile.calltrace.cycles-pp.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
1.74 ± 33% -1.2 0.58 ± 73% perf-profile.calltrace.cycles-pp.khugepaged_scan_mm_slot.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
1.69 ± 33% -1.1 0.55 ± 72% perf-profile.calltrace.cycles-pp.collapse_huge_page.hpage_collapse_scan_pmd.khugepaged_scan_mm_slot.khugepaged.kthread
1.58 ± 33% -1.1 0.51 ± 72% perf-profile.calltrace.cycles-pp.__collapse_huge_page_copy.collapse_huge_page.hpage_collapse_scan_pmd.khugepaged_scan_mm_slot.khugepaged
1.38 ± 32% -0.9 0.45 ± 72% perf-profile.calltrace.cycles-pp.copy_mc_fragile.__collapse_huge_page_copy.collapse_huge_page.hpage_collapse_scan_pmd.khugepaged_scan_mm_slot
1.90 ± 17% -0.9 0.99 ± 54% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
1.20 ± 18% -0.6 0.55 ± 75% perf-profile.calltrace.cycles-pp.readn.evsel__read_counter.read_counters.process_interval.dispatch_events
1.20 ± 18% -0.6 0.55 ± 75% perf-profile.calltrace.cycles-pp.__libc_read.readn.evsel__read_counter.read_counters.process_interval
1.17 ± 19% -0.6 0.54 ± 75% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read.readn.evsel__read_counter.read_counters
1.17 ± 19% -0.6 0.54 ± 75% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn.evsel__read_counter
46.12 ± 10% -30.1 15.98 ± 51% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
4.07 ± 30% -1.9 2.14 ± 40% perf-profile.children.cycles-pp.kthread
1.74 ± 33% -1.1 0.67 ± 42% perf-profile.children.cycles-pp.hpage_collapse_scan_pmd
1.74 ± 33% -1.1 0.67 ± 43% perf-profile.children.cycles-pp.khugepaged_scan_mm_slot
1.75 ± 33% -1.1 0.68 ± 43% perf-profile.children.cycles-pp.khugepaged
1.69 ± 33% -1.0 0.64 ± 41% perf-profile.children.cycles-pp.collapse_huge_page
1.58 ± 33% -1.0 0.60 ± 41% perf-profile.children.cycles-pp.__collapse_huge_page_copy
1.90 ± 17% -0.9 1.05 ± 42% perf-profile.children.cycles-pp.evsel__read_counter
1.92 ± 18% -0.8 1.10 ± 44% perf-profile.children.cycles-pp.readn
1.92 ± 18% -0.8 1.10 ± 44% perf-profile.children.cycles-pp.__libc_read
0.87 ± 15% -0.4 0.51 ± 38% perf-profile.children.cycles-pp.do_anonymous_page
0.65 ± 16% -0.2 0.42 ± 37% perf-profile.children.cycles-pp.vma_alloc_folio
0.46 ± 26% -0.2 0.23 ± 45% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.44 ± 22% -0.2 0.24 ± 46% perf-profile.children.cycles-pp.mprotect_fixup
0.40 ± 18% -0.2 0.21 ± 41% perf-profile.children.cycles-pp.evlist__id2evsel
0.24 ± 46% -0.2 0.07 ± 64% perf-profile.children.cycles-pp.start_thread
0.23 ± 45% -0.2 0.07 ± 64% perf-profile.children.cycles-pp.perf_evlist__poll_thread
0.40 ± 17% -0.2 0.25 ± 34% perf-profile.children.cycles-pp.mas_store_prealloc
0.28 ± 19% -0.1 0.14 ± 40% perf-profile.children.cycles-pp.__fget_light
0.26 ± 32% -0.1 0.13 ± 33% perf-profile.children.cycles-pp.pte_alloc_one
0.18 ± 43% -0.1 0.06 ± 80% perf-profile.children.cycles-pp.__collapse_huge_page_copy_succeeded
0.16 ± 30% -0.1 0.04 ±109% perf-profile.children.cycles-pp.proc_pid_get_link
0.30 ± 21% -0.1 0.19 ± 40% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.19 ± 16% -0.1 0.09 ± 62% perf-profile.children.cycles-pp.__perf_read_group_add
0.22 ± 22% -0.1 0.12 ± 27% perf-profile.children.cycles-pp.shift_arg_pages
0.15 ± 33% -0.1 0.06 ± 71% perf-profile.children.cycles-pp.xas_find
0.12 ± 26% -0.1 0.03 ±101% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.15 ± 28% -0.1 0.06 ± 63% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.14 ± 27% -0.1 0.06 ± 71% perf-profile.children.cycles-pp.try_charge_memcg
0.19 ± 29% -0.1 0.11 ± 51% perf-profile.children.cycles-pp.down_write
0.17 ± 32% -0.1 0.09 ± 39% perf-profile.children.cycles-pp.__pte_alloc
0.10 ± 32% -0.1 0.03 ±101% perf-profile.children.cycles-pp.blk_mq_queue_tag_busy_iter
0.15 ± 16% -0.1 0.08 ± 54% perf-profile.children.cycles-pp.vsnprintf
0.10 ± 32% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.blk_mq_in_flight
0.12 ± 20% -0.1 0.06 ± 48% perf-profile.children.cycles-pp.slab_show
0.16 ± 24% -0.1 0.10 ± 38% perf-profile.children.cycles-pp.__get_free_pages
0.09 ± 35% -0.1 0.04 ± 79% perf-profile.children.cycles-pp.lookup_open
0.00 +0.3 0.27 ± 57% perf-profile.children.cycles-pp.__sysvec_call_function
0.03 ±100% +0.3 0.33 ± 50% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.00 +0.4 0.38 ± 53% perf-profile.children.cycles-pp.sysvec_call_function
0.04 ± 80% +1.6 1.67 ± 58% perf-profile.children.cycles-pp.asm_sysvec_call_function
43.32 ± 10% -29.9 13.42 ± 61% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
0.40 ± 17% -0.2 0.20 ± 41% perf-profile.self.cycles-pp.evlist__id2evsel
0.28 ± 19% -0.1 0.13 ± 40% perf-profile.self.cycles-pp.__fget_light
0.24 ± 30% -0.1 0.12 ± 29% perf-profile.self.cycles-pp.zap_pte_range
0.14 ± 28% -0.1 0.06 ± 76% perf-profile.self.cycles-pp.evlist_cpu_iterator__next
0.13 ± 30% -0.1 0.05 ± 75% perf-profile.self.cycles-pp.try_charge_memcg
0.17 ± 27% -0.1 0.10 ± 51% perf-profile.self.cycles-pp.down_write
0.10 ± 31% -0.1 0.03 ±102% perf-profile.self.cycles-pp.blk_mq_queue_tag_busy_iter
0.13 ± 30% -0.1 0.06 ± 65% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.11 ± 30% -0.1 0.05 ± 75% perf-profile.self.cycles-pp.mutex_lock




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki