[tip:sched/eevdf] [sched/fair] e0c2ff903c: phoronix-test-suite.blogbench.Write.final_score -34.8% regression

From: kernel test robot
Date: Thu Aug 10 2023 - 09:25:31 EST




Hello,

kernel test robot noticed a -34.8% regression of phoronix-test-suite.blogbench.Write.final_score on:


commit: e0c2ff903c320d3fd3c2c604dc401b3b7c0a1d13 ("sched/fair: Remove sched_feat(START_DEBIT)")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/eevdf

testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:

test: blogbench-1.1.0
option_a: Write
cpufreq_governor: performance


(
previously, we reported
"[tip:sched/eevdf] [sched/fair] e0c2ff903c: pft.faults_per_sec_per_cpu 7.0% improvement"
on
https://lore.kernel.org/all/202308091624.d97ae058-oliver.sang@xxxxxxxxx/
since now we found a regression, so report again FYI
)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202308101628.7af4631a-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230810/202308101628.7af4631a-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Write/debian-x86_64-phoronix/lkp-csl-2sp7/blogbench-1.1.0/phoronix-test-suite

commit:
af4cf40470 ("sched/fair: Add cfs_rq::avg_vruntime")
e0c2ff903c ("sched/fair: Remove sched_feat(START_DEBIT)")

af4cf40470c22efa e0c2ff903c320d3fd3c2c604dc4
---------------- ---------------------------
%stddev %change %stddev
\ | \
13.43 ± 6% +1.7 15.15 ± 5% mpstat.cpu.all.idle%
4.516e+09 ± 7% +13.6% 5.129e+09 ± 6% cpuidle..time
5386162 ± 6% +15.1% 6199901 ± 5% cpuidle..usage
5829408 ± 7% -8.7% 5320418 ± 4% numa-numastat.node0.local_node
5839930 ± 7% -8.7% 5333078 ± 4% numa-numastat.node0.numa_hit
6025065 ± 5% -12.9% 5245696 ± 7% numa-numastat.node1.local_node
57086 -28.2% 40989 vmstat.io.bo
11120354 -21.6% 8721798 vmstat.memory.cache
18750 +12.1% 21014 vmstat.system.cs
215070 ± 25% +29.4% 278379 ± 23% numa-meminfo.node1.AnonPages.max
5507703 ± 13% -31.6% 3766183 ± 23% numa-meminfo.node1.FilePages
2118171 ± 13% -26.7% 1551936 ± 25% numa-meminfo.node1.Inactive
6824965 ± 12% -26.7% 5005740 ± 18% numa-meminfo.node1.MemUsed
4960 -34.8% 3235 ± 2% phoronix-test-suite.blogbench.Write.final_score
35120853 -29.4% 24804986 phoronix-test-suite.time.file_system_outputs
8058 -1.9% 7908 phoronix-test-suite.time.percent_of_cpu_this_job_got
26517 -1.2% 26196 phoronix-test-suite.time.system_time
1445969 +10.6% 1599011 phoronix-test-suite.time.voluntary_context_switches
600079 ± 3% +26.4% 758665 ± 3% turbostat.C1E
0.75 ± 3% +0.1 0.84 ± 4% turbostat.C1E%
4372028 ± 7% +14.1% 4987180 ± 7% turbostat.C6
13.09 ± 6% +1.6 14.72 ± 6% turbostat.C6%
11.39 ± 5% +10.4% 12.57 ± 4% turbostat.CPU%c1
2330913 ± 13% -23.5% 1782895 ± 14% numa-vmstat.node0.nr_dirtied
2256547 ± 13% -23.8% 1719197 ± 14% numa-vmstat.node0.nr_written
5840154 ± 7% -8.7% 5333324 ± 4% numa-vmstat.node0.numa_hit
5829632 ± 7% -8.7% 5320664 ± 4% numa-vmstat.node0.numa_local
2287685 ± 11% -33.8% 1514622 ± 20% numa-vmstat.node1.nr_dirtied
1376658 ± 13% -31.6% 941500 ± 23% numa-vmstat.node1.nr_file_pages
127.17 ± 18% -52.4% 60.50 ± 31% numa-vmstat.node1.nr_writeback
2215841 ± 12% -34.1% 1460272 ± 20% numa-vmstat.node1.nr_written
6025232 ± 5% -12.9% 5245703 ± 7% numa-vmstat.node1.numa_local
74.24 ± 8% -38.1% 45.97 ± 11% sched_debug.cfs_rq:/.load_avg.avg
3167 ± 14% -78.6% 676.56 ± 45% sched_debug.cfs_rq:/.load_avg.max
339.56 ± 15% -74.2% 87.77 ± 38% sched_debug.cfs_rq:/.load_avg.stddev
1829 ± 6% -12.7% 1597 ± 9% sched_debug.cfs_rq:/.runnable_avg.max
1490 ± 7% -13.5% 1289 ± 7% sched_debug.cfs_rq:/.util_avg.max
1495 ± 7% -13.3% 1296 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.max
33772 +15.4% 38971 sched_debug.cpu.nr_switches.avg
117253 ± 10% +106.6% 242187 ± 13% sched_debug.cpu.nr_switches.max
11889 ± 9% +96.8% 23399 ± 12% sched_debug.cpu.nr_switches.stddev
58611259 -50.0% 29305629 sched_debug.sysctl_sched.sysctl_sched_features
6885022 ± 2% -25.3% 5139844 ± 2% meminfo.Active
6735239 ± 2% -25.9% 4988483 ± 2% meminfo.Active(file)
10388292 -21.3% 8173968 meminfo.Cached
3953695 -11.9% 3483739 meminfo.Inactive
2793179 -16.8% 2323187 meminfo.Inactive(file)
716630 -21.7% 561030 meminfo.KReclaimable
12938216 -18.5% 10538792 meminfo.Memused
716630 -21.7% 561030 meminfo.SReclaimable
462556 -10.4% 414369 meminfo.SUnreclaim
1179187 -17.3% 975399 meminfo.Slab
13543613 -19.3% 10926031 meminfo.max_used_kB
1682930 ± 2% -25.9% 1246741 ± 2% proc-vmstat.nr_active_file
4618598 -28.6% 3297498 ± 2% proc-vmstat.nr_dirtied
115523 ± 2% -4.3% 110605 proc-vmstat.nr_dirty
2596313 -21.3% 2043401 proc-vmstat.nr_file_pages
698013 -16.8% 580677 proc-vmstat.nr_inactive_file
179075 -21.7% 140220 proc-vmstat.nr_slab_reclaimable
115576 -10.4% 103600 proc-vmstat.nr_slab_unreclaimable
200.00 ± 12% -60.2% 79.67 ± 27% proc-vmstat.nr_writeback
4472388 -28.9% 3179453 ± 2% proc-vmstat.nr_written
1682930 ± 2% -25.9% 1246741 ± 2% proc-vmstat.nr_zone_active_file
698013 -16.8% 580677 proc-vmstat.nr_zone_inactive_file
115652 ± 2% -4.4% 110571 proc-vmstat.nr_zone_write_pending
11888053 -10.8% 10598323 ± 2% proc-vmstat.numa_hit
11861225 -10.9% 10573027 ± 2% proc-vmstat.numa_local
3434596 -28.4% 2460772 proc-vmstat.pgactivate
18752650 -8.2% 17219808 ± 3% proc-vmstat.pgalloc_normal
18252683 -8.4% 16727157 ± 3% proc-vmstat.pgfree
19379562 -27.5% 14053587 ± 2% proc-vmstat.pgpgout
205988 +3.5% 213118 proc-vmstat.pgreuse
97.12 -1.1 96.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
97.08 -1.1 96.02 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
52.33 -1.0 51.36 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
52.32 -1.0 51.35 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.12 -1.1 96.06 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
97.08 -1.1 96.02 perf-profile.children.cycles-pp.do_syscall_64
85.21 -1.0 84.18 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
85.58 -1.0 84.56 perf-profile.children.cycles-pp._raw_spin_lock
52.33 -1.0 51.36 perf-profile.children.cycles-pp.__x64_sys_openat
52.32 -1.0 51.35 perf-profile.children.cycles-pp.do_sys_openat2
0.64 ± 2% +0.0 0.68 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.28 ± 5% +0.0 0.32 ± 5% perf-profile.children.cycles-pp.__irq_exit_rcu
0.02 ±141% +0.1 0.07 ± 15% perf-profile.children.cycles-pp.__x64_sys_rename
0.08 ± 29% +0.1 0.14 ± 19% perf-profile.children.cycles-pp.process_one_work
0.01 ±223% +0.1 0.07 ± 15% perf-profile.children.cycles-pp.do_renameat2
0.08 ± 29% +0.1 0.15 ± 20% perf-profile.children.cycles-pp.worker_thread
0.03 ±101% +0.1 0.10 ± 21% perf-profile.children.cycles-pp.__extent_writepage
0.03 ±103% +0.1 0.11 ± 20% perf-profile.children.cycles-pp.do_writepages
0.03 ±103% +0.1 0.11 ± 20% perf-profile.children.cycles-pp.extent_writepages
0.03 ±103% +0.1 0.11 ± 20% perf-profile.children.cycles-pp.extent_write_cache_pages
84.73 -1.0 83.69 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
5.487e+09 -2.2% 5.364e+09 perf-stat.i.branch-instructions
0.73 ± 8% +0.2 0.92 ± 12% perf-stat.i.branch-miss-rate%
18867 +12.0% 21136 perf-stat.i.context-switches
2.307e+11 -2.0% 2.261e+11 perf-stat.i.cpu-cycles
1352 ± 2% +3.5% 1399 perf-stat.i.cpu-migrations
4638095 ± 27% -38.2% 2865514 ± 23% perf-stat.i.dTLB-load-misses
6.107e+09 -2.3% 5.965e+09 perf-stat.i.dTLB-loads
2.391e+10 -2.3% 2.335e+10 perf-stat.i.instructions
0.17 -7.8% 0.16 ± 2% perf-stat.i.ipc
0.62 ± 2% +14.5% 0.71 ± 6% perf-stat.i.major-faults
2.40 -2.0% 2.35 perf-stat.i.metric.GHz
139.98 -2.3% 136.76 perf-stat.i.metric.M/sec
66.90 -1.8 65.11 perf-stat.i.node-load-miss-rate%
65029001 -5.5% 61464545 ± 3% perf-stat.i.node-load-misses
0.08 ± 27% -0.0 0.05 ± 24% perf-stat.overall.dTLB-load-miss-rate%
67.28 -2.3 64.93 perf-stat.overall.node-load-miss-rate%
5.474e+09 -2.2% 5.352e+09 perf-stat.ps.branch-instructions
18821 +12.1% 21105 perf-stat.ps.context-switches
2.301e+11 -2.0% 2.256e+11 perf-stat.ps.cpu-cycles
1349 ± 2% +3.6% 1398 perf-stat.ps.cpu-migrations
4630176 ± 27% -38.2% 2859589 ± 23% perf-stat.ps.dTLB-load-misses
6.092e+09 -2.3% 5.952e+09 perf-stat.ps.dTLB-loads
2.385e+10 -2.3% 2.33e+10 perf-stat.ps.instructions
0.61 ± 2% +14.6% 0.70 ± 6% perf-stat.ps.major-faults
64859061 -5.5% 61314167 ± 3% perf-stat.ps.node-load-misses
8.031e+12 -1.4% 7.921e+12 perf-stat.total.instructions



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki