Re: [lkp-robot] [sched/fair] a4c3c04974: unixbench.score -4.3% regression

From: Vincent Guittot
Date: Mon Jan 08 2018 - 04:34:50 EST


Hi Xiaolong,

On 25 December 2017 at 07:07, kernel test robot <xiaolong.ye@xxxxxxxxx> wrote:
>
> Greeting,
>
> FYI, we noticed a -4.3% regression of unixbench.score due to commit:
>
>
> commit: a4c3c04974d648ee6e1a09ef4131eb32a02ab494 ("sched/fair: Update and fix the runnable propagation rule")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: unixbench
> on test machine: 8 threads Ivy Bridge with 16G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 100%
> test: shell1
> cpufreq_governor: performance
>
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench
>

I don't have the machine described above so i have tried to reproduce
the problem on my 8 cores cortex A53 platform but I don't have
performance regression.
I have also tried with a VM on a Intel(R) Core(TM) i7-4810MQ and
haven't seen regression too.

Have you seen the regression on other platform ?

Regards,
Vincent

>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/lkp-ivb-d01/shell1/unixbench
>
> commit:
> c6b9d9a330 ("sched/wait: Fix add_wait_queue() behavioral change")
> a4c3c04974 ("sched/fair: Update and fix the runnable propagation rule")
>
> c6b9d9a330290144 a4c3c04974d648ee6e1a09ef41
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 13264 -4.3% 12694 unixbench.score
> 10619292 -11.7% 9374917 unixbench.time.involuntary_context_switches
> 4.829e+08 -4.3% 4.62e+08 unixbench.time.minor_page_faults
> 1126 -3.6% 1086 unixbench.time.system_time
> 2645 -3.0% 2566 unixbench.time.user_time
> 15855720 -6.2% 14878247 unixbench.time.voluntary_context_switches
> 0.00 Ä 56% -0.0 0.00 Ä 57% mpstat.cpu.iowait%
> 79517 -5.7% 74990 vmstat.system.cs
> 16361 -3.3% 15822 vmstat.system.in
> 1.814e+08 -24.0% 1.379e+08 cpuidle.C1.time
> 3436399 -20.6% 2728227 cpuidle.C1.usage
> 7772815 -9.9% 7001076 cpuidle.C1E.usage
> 1.479e+08 +66.1% 2.456e+08 cpuidle.C3.time
> 1437889 +38.7% 1994073 cpuidle.C3.usage
> 18147 +13.9% 20676 cpuidle.POLL.usage
> 3436173 -20.6% 2727580 turbostat.C1
> 3.54 -0.8 2.73 turbostat.C1%
> 7772758 -9.9% 7001012 turbostat.C1E
> 1437858 +38.7% 1994034 turbostat.C3
> 2.88 +2.0 4.86 turbostat.C3%
> 18.50 +10.8% 20.50 turbostat.CPU%c1
> 0.54 Ä 2% +179.6% 1.51 turbostat.CPU%c3
> 2.32e+12 -4.3% 2.22e+12 perf-stat.branch-instructions
> 6.126e+10 -4.9% 5.823e+10 perf-stat.branch-misses
> 8.64 Ä 4% +0.6 9.25 perf-stat.cache-miss-rate%
> 1.662e+11 -4.3% 1.59e+11 perf-stat.cache-references
> 51040611 -7.0% 47473754 perf-stat.context-switches
> 1.416e+13 -3.6% 1.365e+13 perf-stat.cpu-cycles
> 8396968 -3.9% 8065835 perf-stat.cpu-migrations
> 2.919e+12 -4.3% 2.793e+12 perf-stat.dTLB-loads
> 1.89e+12 -4.3% 1.809e+12 perf-stat.dTLB-stores
> 67.97 +1.1 69.03 perf-stat.iTLB-load-miss-rate%
> 4.767e+09 -1.3% 4.704e+09 perf-stat.iTLB-load-misses
> 2.247e+09 -6.0% 2.111e+09 perf-stat.iTLB-loads
> 1.14e+13 -4.3% 1.091e+13 perf-stat.instructions
> 2391 -3.0% 2319 perf-stat.instructions-per-iTLB-miss
> 4.726e+08 -4.3% 4.523e+08 perf-stat.minor-faults
> 4.726e+08 -4.3% 4.523e+08 perf-stat.page-faults
> 585.14 Ä 4% -55.0% 263.59 Ä 12% sched_debug.cfs_rq:/.load_avg.avg
> 1470 Ä 4% -42.2% 850.09 Ä 24% sched_debug.cfs_rq:/.load_avg.max
> 154.17 Ä 22% -49.2% 78.39 Ä 7% sched_debug.cfs_rq:/.load_avg.min
> 438.33 Ä 6% -41.9% 254.49 Ä 27% sched_debug.cfs_rq:/.load_avg.stddev
> 2540 Ä 15% +23.5% 3137 Ä 11% sched_debug.cfs_rq:/.removed.runnable_sum.avg
> 181.83 Ä 11% -56.3% 79.50 Ä 34% sched_debug.cfs_rq:/.runnable_load_avg.avg
> 16.46 Ä 37% -72.9% 4.45 Ä110% sched_debug.cfs_rq:/.runnable_load_avg.min
> 294.77 Ä 5% +11.2% 327.87 Ä 6% sched_debug.cfs_rq:/.util_avg.stddev
> 220260 Ä 8% +20.3% 264870 Ä 4% sched_debug.cpu.avg_idle.avg
> 502903 Ä 4% +21.0% 608663 sched_debug.cpu.avg_idle.max
> 148667 Ä 6% +29.5% 192468 Ä 2% sched_debug.cpu.avg_idle.stddev
> 180.64 Ä 10% -53.4% 84.23 Ä 34% sched_debug.cpu.cpu_load[0].avg
> 25.73 Ä 15% -85.6% 3.70 Ä113% sched_debug.cpu.cpu_load[0].min
> 176.98 Ä 6% -52.5% 84.06 Ä 35% sched_debug.cpu.cpu_load[1].avg
> 53.93 Ä 13% -72.6% 14.75 Ä 15% sched_debug.cpu.cpu_load[1].min
> 176.61 Ä 4% -55.3% 78.92 Ä 31% sched_debug.cpu.cpu_load[2].avg
> 73.78 Ä 11% -73.4% 19.61 Ä 7% sched_debug.cpu.cpu_load[2].min
> 177.42 Ä 3% -58.8% 73.09 Ä 21% sched_debug.cpu.cpu_load[3].avg
> 93.01 Ä 8% -73.9% 24.25 Ä 6% sched_debug.cpu.cpu_load[3].min
> 173.36 Ä 3% -60.6% 68.26 Ä 13% sched_debug.cpu.cpu_load[4].avg
> 274.36 Ä 5% -48.6% 141.16 Ä 44% sched_debug.cpu.cpu_load[4].max
> 107.87 Ä 6% -73.0% 29.11 Ä 9% sched_debug.cpu.cpu_load[4].min
> 11203 Ä 9% +9.9% 12314 Ä 6% sched_debug.cpu.curr->pid.avg
> 1042556 Ä 3% -6.9% 970165 Ä 2% sched_debug.cpu.sched_goidle.max
> 748905 Ä 5% -13.4% 648459 sched_debug.cpu.sched_goidle.min
> 90872 Ä 11% +17.4% 106717 Ä 5% sched_debug.cpu.sched_goidle.stddev
> 457847 Ä 4% -15.0% 389113 sched_debug.cpu.ttwu_local.min
> 18.60 -1.1 17.45 perf-profile.calltrace.cycles-pp.secondary_startup_64
> 16.33 Ä 2% -1.0 15.29 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
> 16.33 Ä 2% -1.0 15.29 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
> 16.32 Ä 2% -1.0 15.29 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 15.44 Ä 2% -1.0 14.43 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 15.69 Ä 2% -1.0 14.71 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 5.54 -0.1 5.45 perf-profile.calltrace.cycles-pp.__libc_fork
> 10.28 +0.0 10.32 perf-profile.calltrace.cycles-pp.page_fault
> 10.16 +0.0 10.21 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
> 10.15 +0.1 10.20 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
> 9.47 +0.1 9.56 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 11.49 +0.1 11.59 perf-profile.calltrace.cycles-pp.sys_execve.do_syscall_64.return_from_SYSCALL_64.execve
> 8.28 +0.1 8.38 perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.do_execveat_common.sys_execve.do_syscall_64
> 11.49 +0.1 11.59 perf-profile.calltrace.cycles-pp.return_from_SYSCALL_64.execve
> 11.49 +0.1 11.59 perf-profile.calltrace.cycles-pp.do_syscall_64.return_from_SYSCALL_64.execve
> 8.30 +0.1 8.41 perf-profile.calltrace.cycles-pp.search_binary_handler.do_execveat_common.sys_execve.do_syscall_64.return_from_SYSCALL_64
> 11.46 +0.1 11.58 perf-profile.calltrace.cycles-pp.do_execveat_common.sys_execve.do_syscall_64.return_from_SYSCALL_64.execve
> 8.46 +0.1 8.57 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 5.21 +0.1 5.34 Ä 2% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.do_exit.do_group_exit.__wake_up_parent
> 5.24 +0.1 5.38 Ä 2% perf-profile.calltrace.cycles-pp.mmput.do_exit.do_group_exit.__wake_up_parent.entry_SYSCALL_64_fastpath
> 13.20 +0.1 13.34 perf-profile.calltrace.cycles-pp.execve
> 6.79 +0.2 6.94 Ä 2% perf-profile.calltrace.cycles-pp.__wake_up_parent.entry_SYSCALL_64_fastpath
> 6.79 +0.2 6.95 Ä 2% perf-profile.calltrace.cycles-pp.do_group_exit.__wake_up_parent.entry_SYSCALL_64_fastpath
> 6.78 +0.2 6.94 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__wake_up_parent.entry_SYSCALL_64_fastpath
> 5.98 +0.2 6.18 perf-profile.calltrace.cycles-pp.vfprintf.__vsnprintf_chk
> 8.38 +0.2 8.61 perf-profile.calltrace.cycles-pp.__vsnprintf_chk
> 14.17 +0.3 14.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
> 18.60 -1.1 17.45 perf-profile.children.cycles-pp.do_idle
> 18.60 -1.1 17.45 perf-profile.children.cycles-pp.cpu_startup_entry
> 18.60 -1.1 17.45 perf-profile.children.cycles-pp.secondary_startup_64
> 17.60 -1.1 16.46 perf-profile.children.cycles-pp.intel_idle
> 17.89 -1.1 16.80 perf-profile.children.cycles-pp.cpuidle_enter_state
> 16.33 Ä 2% -1.0 15.29 perf-profile.children.cycles-pp.start_secondary
> 5.54 -0.1 5.45 perf-profile.children.cycles-pp.__libc_fork
> 16.15 +0.0 16.18 perf-profile.children.cycles-pp.do_page_fault
> 16.19 +0.0 16.22 perf-profile.children.cycles-pp.page_fault
> 6.24 +0.1 6.29 Ä 2% perf-profile.children.cycles-pp.filemap_map_pages
> 16.07 +0.1 16.13 perf-profile.children.cycles-pp.__do_page_fault
> 16.85 +0.1 16.92 perf-profile.children.cycles-pp.do_syscall_64
> 16.85 +0.1 16.92 perf-profile.children.cycles-pp.return_from_SYSCALL_64
> 9.22 +0.1 9.33 perf-profile.children.cycles-pp.search_binary_handler
> 13.49 +0.1 13.61 perf-profile.children.cycles-pp.__handle_mm_fault
> 4.89 +0.1 5.02 Ä 2% perf-profile.children.cycles-pp.unmap_page_range
> 9.11 +0.1 9.24 perf-profile.children.cycles-pp.load_elf_binary
> 13.20 +0.1 13.34 perf-profile.children.cycles-pp.execve
> 12.82 +0.1 12.96 perf-profile.children.cycles-pp.sys_execve
> 4.95 +0.2 5.10 Ä 2% perf-profile.children.cycles-pp.unmap_vmas
> 12.79 +0.2 12.95 perf-profile.children.cycles-pp.do_execveat_common
> 13.90 +0.2 14.07 perf-profile.children.cycles-pp.handle_mm_fault
> 6.95 +0.2 7.13 Ä 2% perf-profile.children.cycles-pp.do_exit
> 6.95 +0.2 7.13 Ä 2% perf-profile.children.cycles-pp.do_group_exit
> 6.95 +0.2 7.13 Ä 2% perf-profile.children.cycles-pp.__wake_up_parent
> 6.40 Ä 2% +0.2 6.62 perf-profile.children.cycles-pp.vfprintf
> 8.38 +0.2 8.61 perf-profile.children.cycles-pp.__vsnprintf_chk
> 9.21 +0.2 9.46 perf-profile.children.cycles-pp.mmput
> 9.16 +0.2 9.41 perf-profile.children.cycles-pp.exit_mmap
> 19.85 +0.3 20.13 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
> 17.60 -1.1 16.46 perf-profile.self.cycles-pp.intel_idle
> 6.03 Ä 2% +0.2 6.26 perf-profile.self.cycles-pp.vfprintf
>
>
>
> unixbench.score
>
> 14000 +-+-----------------------------------------------------------------+
> O.O..O.O.O.O..O.O.O.O..O.O.O.O..O.O.O..O.O.O.+ +.+.+..+.+.+.+..+.|
> 12000 +-+ : : |
> | : : |
> 10000 +-+ : : |
> | : : |
> 8000 +-+ : : |
> | : : |
> 6000 +-+ : : |
> | : : |
> 4000 +-+ : : |
> | :: |
> 2000 +-+ : |
> | : |
> 0 +-+-----------------------------------------------------------------+
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong