performance changes on 78373b73: -46.6% fsmark.files_per_sec, and few more

From: Yuanhan Liu
Date: Tue Apr 21 2015 - 00:33:01 EST


FYI, we found changes on `fsmark.files_per_sec' by 78373b7319abdf15050af5b1632c4c8b8b398f33:

> commit 78373b7319abdf15050af5b1632c4c8b8b398f33
> Author: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> AuthorDate: Fri Mar 13 21:44:36 2015 -0700
> Commit: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> CommitDate: Fri Apr 10 15:08:45 2015 -0700
>
> f2fs: enhance multi-threads performance

3402e87cfb5e762f9c95071bf4a2ad65fd9392a2 78373b7319abdf15050af5b1632c4c8b8b398f33
---------------------------------------- ----------------------------------------
run time(m) metric_value ±stddev run time(m) metric_value ±stddev change testbox/benchmark/testcase-params
--- ------ ---------------------------- --- ------ ---------------------------- -------- ------------------------------
3 0.3 |490.800| ±5.7 3 0.5 |262.067| ±0.4 -46.6% ivb44/fsmark/1x-64t-4BRD_12G-RAID0-f2fs-4M-30G-fsyncBeforeClose
3 0.3 |468.367| ±3.5 3 0.5 |264.467| ±0.2 -43.5% ivb44/fsmark/1x-64t-9BRD_6G-RAID0-f2fs-4M-30G-fsyncBeforeClose
3 0.6 |211.867| ±0.7 3 0.7 |191.067| ±0.5 -9.8% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-f2fs-4M-30G-fsyncBeforeClose

NOTE: here are some more info about those test parameters for you to
know what the testcase does better:

1x: where 'x' means iterations or loop, corresponding to the 'L' option of fsmark

1t, 64t: where 't' means thread

4M: means the single file size, corresponding to the '-s' option of fsmark
40G, 30G, 120G: means the total test size

4BRD_12G: BRD is the ramdisk, where '4' means 4 ramdisk, and where '12G' means
the size of one ramdisk. So, it would be 48G in total. And we made a
raid on those ramdisk


The change is a bit interesting as you already stated it clear that this
patch is for performance gain. The patch itself is clear, too: remove a
mutex lock. So the only reasonable cause, without too much dig, I can think
of would be the remove of this lock reduces sleep time, and brings more
process to be able run, but somehow increases the context switches and cpu
usage in the meantime at somewhere. I guess this is what the following
changes are trying to tell us:

29708 ± 2% +5720.0% 1729051 ± 1% fsmark.time.voluntary_context_switches
302 ± 0% +113.8% 647 ± 0% fsmark.time.percent_of_cpu_this_job_got
61.05 ± 0% +214.0% 191.70 ± 0% fsmark.time.system_time


FYI, Here I listed all changes for the outstanding change:

3 0.3 |490.800| ±5.7 3 0.5 |262.067| ±0.4 -46.6% ivb44/fsmark/1x-64t-4BRD_12G-RAID0-f2fs-4M-30G-fsyncBeforeClose

3402e87cfb5e762f 78373b7319abdf15050af5b163
---------------- --------------------------
%stddev %change %stddev
\ | \
29708 ± 2% +5720.0% 1729051 ± 1% fsmark.time.voluntary_context_switches
61.05 ± 0% +214.0% 191.70 ± 0% fsmark.time.system_time
302 ± 0% +113.8% 647 ± 0% fsmark.time.percent_of_cpu_this_job_got
10476 ± 0% +95.4% 20467 ± 5% fsmark.time.minor_page_faults
490 ± 5% -46.6% 262 ± 0% fsmark.files_per_sec
20.21 ± 0% +46.7% 29.65 ± 0% fsmark.time.elapsed_time
20.21 ± 0% +46.7% 29.65 ± 0% fsmark.time.elapsed_time.max
226379 ± 0% +32.5% 299882 ± 0% fsmark.app_overhead
0 ± 0% +Inf% 1045 ± 2% proc-vmstat.numa_pages_migrated
209 ± 26% +3272.3% 7059 ± 3% cpuidle.C1E-IVT.usage
228 ± 42% +686.7% 1799 ± 14% numa-meminfo.node0.Writeback
14633 ± 5% +7573.2% 1122849 ± 1% cpuidle.C1-IVT.usage
0 ± 0% +Inf% 1045 ± 2% proc-vmstat.pgmigrate_success
29708 ± 2% +5720.0% 1729051 ± 1% time.voluntary_context_switches
55663 ± 0% +776.9% 488081 ± 0% cpuidle.C6-IVT.usage
56 ± 42% +718.8% 464 ± 11% numa-vmstat.node0.nr_writeback
535 ± 29% +334.4% 2325 ± 10% meminfo.Writeback
129 ± 30% +295.6% 511 ± 4% proc-vmstat.nr_writeback
59.25 ± 5% -74.2% 15.26 ± 3% turbostat.CPU%c6
2.58 ± 8% -74.5% 0.66 ± 11% turbostat.Pkg%pc2
1.551e+08 ± 14% +233.4% 5.171e+08 ± 4% cpuidle.C1-IVT.time
32564 ± 24% +208.1% 100330 ± 5% softirqs.RCU
61.05 ± 0% +214.0% 191.70 ± 0% time.system_time
60 ± 32% +165.7% 160 ± 16% numa-vmstat.node1.nr_writeback
2 ± 0% +200.0% 6 ± 0% vmstat.procs.r
3057 ± 2% +166.1% 8136 ± 22% numa-vmstat.node0.nr_mapped
12240 ± 2% +165.9% 32547 ± 22% numa-meminfo.node0.Mapped
6324 ± 3% +148.4% 15709 ± 0% proc-vmstat.nr_mapped
25318 ± 3% +148.6% 62931 ± 0% meminfo.Mapped
302 ± 0% +113.8% 647 ± 0% time.percent_of_cpu_this_job_got
34.39 ± 8% +102.9% 69.79 ± 0% turbostat.CPU%c1
3247 ± 32% +124.9% 7303 ± 14% numa-vmstat.node0.nr_active_anon
12994 ± 32% +124.8% 29215 ± 14% numa-meminfo.node0.Active(anon)
3256 ± 4% +132.5% 7569 ± 23% numa-vmstat.node1.nr_mapped
3252 ± 32% +124.6% 7304 ± 14% numa-vmstat.node0.nr_anon_pages
13036 ± 4% +132.3% 30279 ± 23% numa-meminfo.node1.Mapped
13033 ± 31% +124.1% 29214 ± 14% numa-meminfo.node0.AnonPages
10476 ± 0% +95.4% 20467 ± 5% time.minor_page_faults
121752 ± 7% -35.8% 78122 ± 2% numa-meminfo.node0.Dirty
30381 ± 7% -36.0% 19456 ± 2% numa-vmstat.node0.nr_dirty
113996 ± 4% +64.4% 187416 ± 0% proc-vmstat.pgfault
59301 ± 1% +60.9% 95414 ± 1% softirqs.SCHED
1.92 ± 1% -31.9% 1.31 ± 4% turbostat.Pkg%pc6
16 ± 35% +54.2% 24 ± 10% cpuidle.POLL.usage
20.21 ± 0% +46.7% 29.65 ± 0% time.elapsed_time.max
20.21 ± 0% +46.7% 29.65 ± 0% time.elapsed_time
7114 ± 2% +46.3% 10405 ± 0% proc-vmstat.nr_active_anon
28499 ± 2% +46.2% 41659 ± 0% meminfo.Active(anon)
28617 ± 2% +45.7% 41692 ± 0% meminfo.AnonPages
7138 ± 2% +45.7% 10403 ± 0% proc-vmstat.nr_anon_pages
95342 ± 2% +44.9% 138177 ± 6% numa-meminfo.node0.Active
82347 ± 2% +32.3% 108962 ± 9% numa-meminfo.node0.Active(file)
20586 ± 2% +32.3% 27240 ± 9% numa-vmstat.node0.nr_active_file
233023 ± 3% -30.7% 161486 ± 0% meminfo.Dirty
0.19 ± 2% +36.2% 0.26 ± 3% time.user_time
57900 ± 3% -30.5% 40219 ± 0% proc-vmstat.nr_dirty
108006 ± 8% -24.6% 81460 ± 2% numa-meminfo.node1.Dirty
26896 ± 8% -24.4% 20331 ± 2% numa-vmstat.node1.nr_dirty
219977 ± 0% +32.9% 292456 ± 1% softirqs.TIMER
197382 ± 1% +32.6% 261733 ± 0% meminfo.Active
42210 ± 1% +30.2% 54947 ± 0% proc-vmstat.nr_active_file
168883 ± 1% +30.3% 220073 ± 0% meminfo.Active(file)
353 ± 4% +21.7% 430 ± 3% numa-vmstat.node0.nr_page_table_pages
1425 ± 4% +20.2% 1714 ± 4% numa-meminfo.node0.PageTables
86494 ± 2% +28.0% 110745 ± 10% numa-meminfo.node1.Active(file)
21621 ± 2% +28.0% 27683 ± 10% numa-vmstat.node1.nr_active_file
46.00 ± 4% +20.8% 55.56 ± 3% uptime.boot
101939 ± 4% +20.8% 123117 ± 8% numa-meminfo.node1.Active
1412 ± 9% +10.8% 1565 ± 8% numa-vmstat.node0.nr_alloc_batch
90644 ± 6% +22.4% 110984 ± 0% proc-vmstat.pgfree
2084 ± 4% +15.7% 2412 ± 3% uptime.idle
18666 ± 1% +9.2% 20390 ± 4% slabinfo.anon_vma.num_objs
18663 ± 1% +9.3% 20390 ± 4% slabinfo.anon_vma.active_objs
435.51 ± 0% -89.9% 43.95 ± 0% iostat.md0.avgrq-sz
6415 ± 0% +875.4% 62579 ± 1% iostat.md0.w/s
5595 ± 1% +1883.4% 110979 ± 1% vmstat.system.cs
280678 ± 0% +315.2% 1165516 ± 0% iostat.md0.avgqu-sz
6.31 ± 0% +123.9% 14.14 ± 0% turbostat.%Busy
195 ± 0% +116.4% 423 ± 0% turbostat.Avg_MHz
26035 ± 0% +76.8% 46037 ± 0% vmstat.system.in
1370424 ± 0% -28.1% 985009 ± 0% vmstat.io.bo
1459306 ± 0% -28.1% 1048622 ± 0% iostat.md0.wkB/s
72.28 ± 0% +29.6% 93.70 ± 0% turbostat.CorWatt
102 ± 0% +21.4% 124 ± 0% turbostat.PkgWatt
6.66 ± 2% +8.1% 7.19 ± 0% turbostat.RAMWatt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/