Re: [lkp-robot] [sched/fair] a4c3c04974: unixbench.score -4.3% regression

From: Vincent Guittot
Date: Tue Jan 09 2018 - 02:58:40 EST


Hi,

On 8 January 2018 at 10:34, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> Hi Xiaolong,
>
> On 25 December 2017 at 07:07, kernel test robot <xiaolong.ye@xxxxxxxxx> wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -4.3% regression of unixbench.score due to commit:
>>
>>
>> commit: a4c3c04974d648ee6e1a09ef4131eb32a02ab494 ("sched/fair: Update and fix the runnable propagation rule")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: unixbench
>> on test machine: 8 threads Ivy Bridge with 16G memory
>> with following parameters:
>>
>> runtime: 300s
>> nr_task: 100%
>> test: shell1
>> cpufreq_governor: performance
>>
>> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
>> test-url: https://github.com/kdlucas/byte-unixbench
>>
>
> I don't have the machine described above so i have tried to reproduce
> the problem on my 8 cores cortex A53 platform but I don't have
> performance regression.
> I have also tried with a VM on a Intel(R) Core(TM) i7-4810MQ and
> haven't seen regression too.
>
> Have you seen the regression on other platform ?

I have been able to run the test on a 12 cores Intel(R) Xeon(R) CPU
E5-2630 and haven't seen any regression as well
I have changed the command to ./Run Shell1 -c 12 -i 30 instead of
./Run Shell1 -c 8 -i 30 as there were more cores

Regards,
Vincent

>
> Regards,
> Vincent
>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp run job.yaml
>>
>> =========================================================================================
>> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>> gcc-7/performance/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/lkp-ivb-d01/shell1/unixbench
>>
>> commit:
>> c6b9d9a330 ("sched/wait: Fix add_wait_queue() behavioral change")
>> a4c3c04974 ("sched/fair: Update and fix the runnable propagation rule")
>>
>> c6b9d9a330290144 a4c3c04974d648ee6e1a09ef41
>> ---------------- --------------------------
>> %stddev %change %stddev
>> \ | \
>> 13264 -4.3% 12694 unixbench.score
>> 10619292 -11.7% 9374917 unixbench.time.involuntary_context_switches
>> 4.829e+08 -4.3% 4.62e+08 unixbench.time.minor_page_faults
>> 1126 -3.6% 1086 unixbench.time.system_time
>> 2645 -3.0% 2566 unixbench.time.user_time
>> 15855720 -6.2% 14878247 unixbench.time.voluntary_context_switches
>> 0.00 Ä 56% -0.0 0.00 Ä 57% mpstat.cpu.iowait%
>> 79517 -5.7% 74990 vmstat.system.cs
>> 16361 -3.3% 15822 vmstat.system.in
>> 1.814e+08 -24.0% 1.379e+08 cpuidle.C1.time
>> 3436399 -20.6% 2728227 cpuidle.C1.usage
>> 7772815 -9.9% 7001076 cpuidle.C1E.usage
>> 1.479e+08 +66.1% 2.456e+08 cpuidle.C3.time
>> 1437889 +38.7% 1994073 cpuidle.C3.usage
>> 18147 +13.9% 20676 cpuidle.POLL.usage
>> 3436173 -20.6% 2727580 turbostat.C1
>> 3.54 -0.8 2.73 turbostat.C1%
>> 7772758 -9.9% 7001012 turbostat.C1E
>> 1437858 +38.7% 1994034 turbostat.C3
>> 2.88 +2.0 4.86 turbostat.C3%
>> 18.50 +10.8% 20.50 turbostat.CPU%c1
>> 0.54 Ä 2% +179.6% 1.51 turbostat.CPU%c3
>> 2.32e+12 -4.3% 2.22e+12 perf-stat.branch-instructions
>> 6.126e+10 -4.9% 5.823e+10 perf-stat.branch-misses
>> 8.64 Ä 4% +0.6 9.25 perf-stat.cache-miss-rate%
>> 1.662e+11 -4.3% 1.59e+11 perf-stat.cache-references
>> 51040611 -7.0% 47473754 perf-stat.context-switches
>> 1.416e+13 -3.6% 1.365e+13 perf-stat.cpu-cycles
>> 8396968 -3.9% 8065835 perf-stat.cpu-migrations
>> 2.919e+12 -4.3% 2.793e+12 perf-stat.dTLB-loads
>> 1.89e+12 -4.3% 1.809e+12 perf-stat.dTLB-stores
>> 67.97 +1.1 69.03 perf-stat.iTLB-load-miss-rate%
>> 4.767e+09 -1.3% 4.704e+09 perf-stat.iTLB-load-misses
>> 2.247e+09 -6.0% 2.111e+09 perf-stat.iTLB-loads
>> 1.14e+13 -4.3% 1.091e+13 perf-stat.instructions
>> 2391 -3.0% 2319 perf-stat.instructions-per-iTLB-miss
>> 4.726e+08 -4.3% 4.523e+08 perf-stat.minor-faults
>> 4.726e+08 -4.3% 4.523e+08 perf-stat.page-faults
>> 585.14 Ä 4% -55.0% 263.59 Ä 12% sched_debug.cfs_rq:/.load_avg.avg
>> 1470 Ä 4% -42.2% 850.09 Ä 24% sched_debug.cfs_rq:/.load_avg.max
>> 154.17 Ä 22% -49.2% 78.39 Ä 7% sched_debug.cfs_rq:/.load_avg.min
>> 438.33 Ä 6% -41.9% 254.49 Ä 27% sched_debug.cfs_rq:/.load_avg.stddev
>> 2540 Ä 15% +23.5% 3137 Ä 11% sched_debug.cfs_rq:/.removed.runnable_sum.avg
>> 181.83 Ä 11% -56.3% 79.50 Ä 34% sched_debug.cfs_rq:/.runnable_load_avg.avg
>> 16.46 Ä 37% -72.9% 4.45 Ä110% sched_debug.cfs_rq:/.runnable_load_avg.min
>> 294.77 Ä 5% +11.2% 327.87 Ä 6% sched_debug.cfs_rq:/.util_avg.stddev
>> 220260 Ä 8% +20.3% 264870 Ä 4% sched_debug.cpu.avg_idle.avg
>> 502903 Ä 4% +21.0% 608663 sched_debug.cpu.avg_idle.max
>> 148667 Ä 6% +29.5% 192468 Ä 2% sched_debug.cpu.avg_idle.stddev
>> 180.64 Ä 10% -53.4% 84.23 Ä 34% sched_debug.cpu.cpu_load[0].avg
>> 25.73 Ä 15% -85.6% 3.70 Ä113% sched_debug.cpu.cpu_load[0].min
>> 176.98 Ä 6% -52.5% 84.06 Ä 35% sched_debug.cpu.cpu_load[1].avg
>> 53.93 Ä 13% -72.6% 14.75 Ä 15% sched_debug.cpu.cpu_load[1].min
>> 176.61 Ä 4% -55.3% 78.92 Ä 31% sched_debug.cpu.cpu_load[2].avg
>> 73.78 Ä 11% -73.4% 19.61 Ä 7% sched_debug.cpu.cpu_load[2].min
>> 177.42 Ä 3% -58.8% 73.09 Ä 21% sched_debug.cpu.cpu_load[3].avg
>> 93.01 Ä 8% -73.9% 24.25 Ä 6% sched_debug.cpu.cpu_load[3].min
>> 173.36 Ä 3% -60.6% 68.26 Ä 13% sched_debug.cpu.cpu_load[4].avg
>> 274.36 Ä 5% -48.6% 141.16 Ä 44% sched_debug.cpu.cpu_load[4].max
>> 107.87 Ä 6% -73.0% 29.11 Ä 9% sched_debug.cpu.cpu_load[4].min
>> 11203 Ä 9% +9.9% 12314 Ä 6% sched_debug.cpu.curr->pid.avg
>> 1042556 Ä 3% -6.9% 970165 Ä 2% sched_debug.cpu.sched_goidle.max
>> 748905 Ä 5% -13.4% 648459 sched_debug.cpu.sched_goidle.min
>> 90872 Ä 11% +17.4% 106717 Ä 5% sched_debug.cpu.sched_goidle.stddev
>> 457847 Ä 4% -15.0% 389113 sched_debug.cpu.ttwu_local.min
>> 18.60 -1.1 17.45 perf-profile.calltrace.cycles-pp.secondary_startup_64
>> 16.33 Ä 2% -1.0 15.29 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
>> 16.33 Ä 2% -1.0 15.29 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
>> 16.32 Ä 2% -1.0 15.29 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
>> 15.44 Ä 2% -1.0 14.43 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
>> 15.69 Ä 2% -1.0 14.71 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
>> 5.54 -0.1 5.45 perf-profile.calltrace.cycles-pp.__libc_fork
>> 10.28 +0.0 10.32 perf-profile.calltrace.cycles-pp.page_fault
>> 10.16 +0.0 10.21 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
>> 10.15 +0.1 10.20 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
>> 9.47 +0.1 9.56 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>> 11.49 +0.1 11.59 perf-profile.calltrace.cycles-pp.sys_execve.do_syscall_64.return_from_SYSCALL_64.execve
>> 8.28 +0.1 8.38 perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.do_execveat_common.sys_execve.do_syscall_64
>> 11.49 +0.1 11.59 perf-profile.calltrace.cycles-pp.return_from_SYSCALL_64.execve
>> 11.49 +0.1 11.59 perf-profile.calltrace.cycles-pp.do_syscall_64.return_from_SYSCALL_64.execve
>> 8.30 +0.1 8.41 perf-profile.calltrace.cycles-pp.search_binary_handler.do_execveat_common.sys_execve.do_syscall_64.return_from_SYSCALL_64
>> 11.46 +0.1 11.58 perf-profile.calltrace.cycles-pp.do_execveat_common.sys_execve.do_syscall_64.return_from_SYSCALL_64.execve
>> 8.46 +0.1 8.57 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>> 5.21 +0.1 5.34 Ä 2% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.do_exit.do_group_exit.__wake_up_parent
>> 5.24 +0.1 5.38 Ä 2% perf-profile.calltrace.cycles-pp.mmput.do_exit.do_group_exit.__wake_up_parent.entry_SYSCALL_64_fastpath
>> 13.20 +0.1 13.34 perf-profile.calltrace.cycles-pp.execve
>> 6.79 +0.2 6.94 Ä 2% perf-profile.calltrace.cycles-pp.__wake_up_parent.entry_SYSCALL_64_fastpath
>> 6.79 +0.2 6.95 Ä 2% perf-profile.calltrace.cycles-pp.do_group_exit.__wake_up_parent.entry_SYSCALL_64_fastpath
>> 6.78 +0.2 6.94 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__wake_up_parent.entry_SYSCALL_64_fastpath
>> 5.98 +0.2 6.18 perf-profile.calltrace.cycles-pp.vfprintf.__vsnprintf_chk
>> 8.38 +0.2 8.61 perf-profile.calltrace.cycles-pp.__vsnprintf_chk
>> 14.17 +0.3 14.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
>> 18.60 -1.1 17.45 perf-profile.children.cycles-pp.do_idle
>> 18.60 -1.1 17.45 perf-profile.children.cycles-pp.cpu_startup_entry
>> 18.60 -1.1 17.45 perf-profile.children.cycles-pp.secondary_startup_64
>> 17.60 -1.1 16.46 perf-profile.children.cycles-pp.intel_idle
>> 17.89 -1.1 16.80 perf-profile.children.cycles-pp.cpuidle_enter_state
>> 16.33 Ä 2% -1.0 15.29 perf-profile.children.cycles-pp.start_secondary
>> 5.54 -0.1 5.45 perf-profile.children.cycles-pp.__libc_fork
>> 16.15 +0.0 16.18 perf-profile.children.cycles-pp.do_page_fault
>> 16.19 +0.0 16.22 perf-profile.children.cycles-pp.page_fault
>> 6.24 +0.1 6.29 Ä 2% perf-profile.children.cycles-pp.filemap_map_pages
>> 16.07 +0.1 16.13 perf-profile.children.cycles-pp.__do_page_fault
>> 16.85 +0.1 16.92 perf-profile.children.cycles-pp.do_syscall_64
>> 16.85 +0.1 16.92 perf-profile.children.cycles-pp.return_from_SYSCALL_64
>> 9.22 +0.1 9.33 perf-profile.children.cycles-pp.search_binary_handler
>> 13.49 +0.1 13.61 perf-profile.children.cycles-pp.__handle_mm_fault
>> 4.89 +0.1 5.02 Ä 2% perf-profile.children.cycles-pp.unmap_page_range
>> 9.11 +0.1 9.24 perf-profile.children.cycles-pp.load_elf_binary
>> 13.20 +0.1 13.34 perf-profile.children.cycles-pp.execve
>> 12.82 +0.1 12.96 perf-profile.children.cycles-pp.sys_execve
>> 4.95 +0.2 5.10 Ä 2% perf-profile.children.cycles-pp.unmap_vmas
>> 12.79 +0.2 12.95 perf-profile.children.cycles-pp.do_execveat_common
>> 13.90 +0.2 14.07 perf-profile.children.cycles-pp.handle_mm_fault
>> 6.95 +0.2 7.13 Ä 2% perf-profile.children.cycles-pp.do_exit
>> 6.95 +0.2 7.13 Ä 2% perf-profile.children.cycles-pp.do_group_exit
>> 6.95 +0.2 7.13 Ä 2% perf-profile.children.cycles-pp.__wake_up_parent
>> 6.40 Ä 2% +0.2 6.62 perf-profile.children.cycles-pp.vfprintf
>> 8.38 +0.2 8.61 perf-profile.children.cycles-pp.__vsnprintf_chk
>> 9.21 +0.2 9.46 perf-profile.children.cycles-pp.mmput
>> 9.16 +0.2 9.41 perf-profile.children.cycles-pp.exit_mmap
>> 19.85 +0.3 20.13 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
>> 17.60 -1.1 16.46 perf-profile.self.cycles-pp.intel_idle
>> 6.03 Ä 2% +0.2 6.26 perf-profile.self.cycles-pp.vfprintf
>>
>>
>>
>> unixbench.score
>>
>> 14000 +-+-----------------------------------------------------------------+
>> O.O..O.O.O.O..O.O.O.O..O.O.O.O..O.O.O..O.O.O.+ +.+.+..+.+.+.+..+.|
>> 12000 +-+ : : |
>> | : : |
>> 10000 +-+ : : |
>> | : : |
>> 8000 +-+ : : |
>> | : : |
>> 6000 +-+ : : |
>> | : : |
>> 4000 +-+ : : |
>> | :: |
>> 2000 +-+ : |
>> | : |
>> 0 +-+-----------------------------------------------------------------+
>>
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
>> Thanks,
>> Xiaolong