Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

From: NeilBrown
Date: Thu Apr 23 2015 - 22:16:16 EST


On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote:

> FYI, we noticed the below changes on
>
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")

Hi,
is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.

Which numbers are "good", which are "bad"? Which is "worst".
What do the graphs really show? and what would we like to see in them?

I think it is really great that you are doing this testing and reporting the
results. It's just so sad that I completely fail to understand them.

Thanks,
NeilBrown

>
>
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
>
> a87d7f782b47e030 878ee6792799e2f88bdcac3298
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 59035 Â 0% +18.4% 69913 Â 1% softirqs.SCHED
> 1330 Â 10% +17.4% 1561 Â 4% slabinfo.kmalloc-512.num_objs
> 1330 Â 10% +17.4% 1561 Â 4% slabinfo.kmalloc-512.active_objs
> 305908 Â 0% -1.8% 300427 Â 0% vmstat.io.bo
> 1 Â 0% +100.0% 2 Â 0% vmstat.procs.r
> 8266 Â 1% -15.7% 6968 Â 0% vmstat.system.cs
> 14819 Â 0% -2.1% 14503 Â 0% vmstat.system.in
> 18.20 Â 6% +10.2% 20.05 Â 4% perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> 1.94 Â 9% +90.6% 3.70 Â 9% perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 0.00 Â 0% +Inf% 25.18 Â 3% perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> 0.00 Â 0% +Inf% 14.14 Â 4% perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 1.79 Â 7% +102.9% 3.64 Â 9% perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> 3.09 Â 4% -10.8% 2.76 Â 4% perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> 0.80 Â 14% +28.1% 1.02 Â 10% perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 14.78 Â 6% -100.0% 0.00 Â 0% perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 25.68 Â 4% -100.0% 0.00 Â 0% perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> 1.23 Â 5% +140.0% 2.96 Â 7% perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> 2.62 Â 6% -95.6% 0.12 Â 33% perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> 0.96 Â 9% +17.5% 1.12 Â 2% perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 1.461e+10 Â 0% -5.3% 1.384e+10 Â 1% perf-stat.L1-dcache-load-misses
> 3.688e+11 Â 0% -2.7% 3.59e+11 Â 0% perf-stat.L1-dcache-loads
> 1.124e+09 Â 0% -27.7% 8.125e+08 Â 0% perf-stat.L1-dcache-prefetches
> 2.767e+10 Â 0% -1.8% 2.717e+10 Â 0% perf-stat.L1-dcache-store-misses
> 2.352e+11 Â 0% -2.8% 2.287e+11 Â 0% perf-stat.L1-dcache-stores
> 6.774e+09 Â 0% -2.3% 6.62e+09 Â 0% perf-stat.L1-icache-load-misses
> 5.571e+08 Â 0% +40.5% 7.826e+08 Â 1% perf-stat.LLC-load-misses
> 6.263e+09 Â 0% -13.7% 5.407e+09 Â 1% perf-stat.LLC-loads
> 1.914e+11 Â 0% -4.2% 1.833e+11 Â 0% perf-stat.branch-instructions
> 1.145e+09 Â 2% -5.6% 1.081e+09 Â 0% perf-stat.branch-load-misses
> 1.911e+11 Â 0% -4.3% 1.829e+11 Â 0% perf-stat.branch-loads
> 1.142e+09 Â 2% -5.1% 1.083e+09 Â 0% perf-stat.branch-misses
> 1.218e+09 Â 0% +19.8% 1.46e+09 Â 0% perf-stat.cache-misses
> 2.118e+10 Â 0% -5.2% 2.007e+10 Â 0% perf-stat.cache-references
> 2510308 Â 1% -15.7% 2115410 Â 0% perf-stat.context-switches
> 39623 Â 0% +22.1% 48370 Â 1% perf-stat.cpu-migrations
> 4.179e+08 Â 40% +165.7% 1.111e+09 Â 35% perf-stat.dTLB-load-misses
> 3.684e+11 Â 0% -2.5% 3.592e+11 Â 0% perf-stat.dTLB-loads
> 1.232e+08 Â 15% +62.5% 2.002e+08 Â 27% perf-stat.dTLB-store-misses
> 2.348e+11 Â 0% -2.5% 2.288e+11 Â 0% perf-stat.dTLB-stores
> 3577297 Â 2% +8.7% 3888986 Â 1% perf-stat.iTLB-load-misses
> 1.035e+12 Â 0% -3.5% 9.988e+11 Â 0% perf-stat.iTLB-loads
> 1.036e+12 Â 0% -3.7% 9.978e+11 Â 0% perf-stat.instructions
> 594 Â 30% +130.3% 1369 Â 13% sched_debug.cfs_rq[0]:/.blocked_load_avg
> 17 Â 10% -28.2% 12 Â 23% sched_debug.cfs_rq[0]:/.nr_spread_over
> 210 Â 21% +42.1% 298 Â 28% sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> 9676 Â 21% +42.1% 13754 Â 28% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> 772 Â 25% +116.5% 1672 Â 9% sched_debug.cfs_rq[0]:/.tg_load_contrib
> 8402 Â 9% +83.3% 15405 Â 11% sched_debug.cfs_rq[0]:/.tg_load_avg
> 8356 Â 9% +82.8% 15272 Â 11% sched_debug.cfs_rq[1]:/.tg_load_avg
> 968 Â 25% +100.8% 1943 Â 14% sched_debug.cfs_rq[1]:/.blocked_load_avg
> 16242 Â 9% -22.2% 12643 Â 14% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> 353 Â 9% -22.1% 275 Â 14% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> 1183 Â 23% +77.7% 2102 Â 12% sched_debug.cfs_rq[1]:/.tg_load_contrib
> 181 Â 8% -31.4% 124 Â 26% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> 8364 Â 8% -31.3% 5745 Â 26% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> 8297 Â 9% +81.7% 15079 Â 12% sched_debug.cfs_rq[2]:/.tg_load_avg
> 30439 Â 13% -45.2% 16681 Â 26% sched_debug.cfs_rq[2]:/.exec_clock
> 39735 Â 14% -48.3% 20545 Â 29% sched_debug.cfs_rq[2]:/.min_vruntime
> 8231 Â 10% +82.2% 15000 Â 12% sched_debug.cfs_rq[3]:/.tg_load_avg
> 1210 Â 14% +110.3% 2546 Â 30% sched_debug.cfs_rq[4]:/.tg_load_contrib
> 8188 Â 10% +82.8% 14964 Â 12% sched_debug.cfs_rq[4]:/.tg_load_avg
> 8132 Â 10% +83.1% 14890 Â 12% sched_debug.cfs_rq[5]:/.tg_load_avg
> 749 Â 29% +205.9% 2292 Â 34% sched_debug.cfs_rq[5]:/.blocked_load_avg
> 963 Â 30% +169.9% 2599 Â 33% sched_debug.cfs_rq[5]:/.tg_load_contrib
> 37791 Â 32% -38.6% 23209 Â 13% sched_debug.cfs_rq[6]:/.min_vruntime
> 693 Â 25% +132.2% 1609 Â 29% sched_debug.cfs_rq[6]:/.blocked_load_avg
> 10838 Â 13% -39.2% 6587 Â 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> 29329 Â 27% -33.2% 19577 Â 10% sched_debug.cfs_rq[6]:/.exec_clock
> 235 Â 14% -39.7% 142 Â 14% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> 8085 Â 10% +83.6% 14848 Â 12% sched_debug.cfs_rq[6]:/.tg_load_avg
> 839 Â 25% +128.5% 1917 Â 18% sched_debug.cfs_rq[6]:/.tg_load_contrib
> 8051 Â 10% +83.6% 14779 Â 12% sched_debug.cfs_rq[7]:/.tg_load_avg
> 156 Â 34% +97.9% 309 Â 19% sched_debug.cpu#0.cpu_load[4]
> 160 Â 25% +64.0% 263 Â 16% sched_debug.cpu#0.cpu_load[2]
> 156 Â 32% +83.7% 286 Â 17% sched_debug.cpu#0.cpu_load[3]
> 164 Â 20% -35.1% 106 Â 31% sched_debug.cpu#2.cpu_load[0]
> 249 Â 15% +80.2% 449 Â 10% sched_debug.cpu#4.cpu_load[3]
> 231 Â 11% +101.2% 466 Â 13% sched_debug.cpu#4.cpu_load[2]
> 217 Â 14% +189.9% 630 Â 38% sched_debug.cpu#4.cpu_load[0]
> 71951 Â 5% +21.6% 87526 Â 7% sched_debug.cpu#4.nr_load_updates
> 214 Â 8% +146.1% 527 Â 27% sched_debug.cpu#4.cpu_load[1]
> 256 Â 17% +75.7% 449 Â 13% sched_debug.cpu#4.cpu_load[4]
> 209 Â 23% +98.3% 416 Â 48% sched_debug.cpu#5.cpu_load[2]
> 68024 Â 2% +18.8% 80825 Â 1% sched_debug.cpu#5.nr_load_updates
> 217 Â 26% +74.9% 380 Â 45% sched_debug.cpu#5.cpu_load[3]
> 852 Â 21% -38.3% 526 Â 22% sched_debug.cpu#6.curr->pid
>
> lkp-st02: Core2
> Memory: 8G
>
>
>
>
> perf-stat.cache-misses
>
> 1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> | O O O O O O O O O O |
> 1.4e+09 ++ |
> 1.2e+09 *+.*...* *..* * *...*..*...*..*...*..*...*..*...*..*
> | : : : : : |
> 1e+09 ++ : : : : : : |
> | : : : : : : |
> 8e+08 ++ : : : : : : |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.L1-dcache-prefetches
>
> 1.2e+09 ++----------------------------------------------------------------+
> *..*...* *..* * ..*.. ..*..*...*..*...*..*...*..*
> 1e+09 ++ : : : : *. *. |
> | : : : :: : |
> | : : : : : : O |
> 8e+08 O+ O: O :O O: O :O: O :O O O O O O O |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> | : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 2e+08 ++ :: :: : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.LLC-load-misses
>
> 1e+09 ++------------------------------------------------------------------+
> 9e+08 O+ O O O O O |
> | O O O O |
> 8e+08 ++ O O O O O O |
> 7e+08 ++ |
> | |
> 6e+08 *+..*..* *...* * *...*..*...*...*..*...*..*...*..*...*
> 5e+08 ++ : : : :: : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 3e+08 ++ : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : : : : |
> 1e+08 ++ : :: : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> perf-stat.context-switches
>
> 3e+06 ++----------------------------------------------------------------+
> | *...*..*... |
> 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... .*
> | : : : : : *. *. |
> O O: O :O O: O :: : O O O O O O |
> 2e+06 ++ : : : :O: O :O O |
> | : : : : : : |
> 1.5e+06 ++ : : : : : : |
> | : : : : : : |
> 1e+06 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 500000 ++ :: : : :: |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> vmstat.system.cs
>
> 10000 ++------------------------------------------------------------------+
> 9000 ++ *...*.. |
> *...*..* *...* * : *...*...*.. ..*..*...*.. ..*
> 8000 ++ : : : : : *. *. |
> 7000 O+ O: O O O: O : : : O O O O O O |
> | : : : :O: O :O O |
> 6000 ++ : : : : : : |
> 5000 ++ : : : : : : |
> 4000 ++ : : : : : : |
> | : : : : : : |
> 3000 ++ : : : : : : |
> 2000 ++ : : : : : : |
> | : : :: :: |
> 1000 ++ : : : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> apt-get install ruby
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/setup-local job.yaml # the job file attached in this email
> bin/run-local job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
>

Attachment: pgp2baQGzvgzJ.pgp
Description: OpenPGP digital signature