[RFC 0/3] reduce latency of direct async compaction

From: Vlastimil Babka
Date: Thu Dec 03 2015 - 03:11:51 EST


The goal is to reduce latency (and increase success) of direct async compaction
by making it focus more on the goal of creating a high-order page, at the
expense of thoroughness. This should be useful for example for THP allocations
where we still get reports of being too expensive, most recently [2].

This is based on an older attempt [1] which I didn't finish as it seemed that
it increased longer-term fragmentation. Now it seems it doesn't, but I'll have
to test more properly. This patch (2) makes migration scanner skip whole
order-aligned blocks where isolation fails, as it takes just one unmigrated
page to prevent a high-order page from merging.

Patch 3 tries to reduce the excessive freepage scanning (such as in [3]) by
allocating migration targets from freelist. We just need to be sure that the
pages are not from the same block as the migrated pages. This is also limited
to direct async compaction and is not meant to replace a (potentially
redesigned) free scanner for other scenarios.

Early tests with stress-highalloc configured to simulate THP allocations:

4.4-rc2 4.4-rc2 4.4-rc2 4.4-rc2
0-test 1-test 2-test 3-test
Success 1 Min 1.00 ( 0.00%) 2.00 (-100.00%) 2.00 (-100.00%) 3.00 (-200.00%)
Success 1 Mean 3.00 ( 0.00%) 3.00 ( 0.00%) 2.80 ( 6.67%) 4.80 (-60.00%)
Success 1 Max 6.00 ( 0.00%) 4.00 ( 33.33%) 5.00 ( 16.67%) 7.00 (-16.67%)
Success 2 Min 1.00 ( 0.00%) 3.00 (-200.00%) 4.00 (-300.00%) 8.00 (-700.00%)
Success 2 Mean 3.80 ( 0.00%) 4.00 ( -5.26%) 5.20 (-36.84%) 11.00 (-189.47%)
Success 2 Max 8.00 ( 0.00%) 7.00 ( 12.50%) 6.00 ( 25.00%) 13.00 (-62.50%)
Success 3 Min 58.00 ( 0.00%) 69.00 (-18.97%) 53.00 ( 8.62%) 66.00 (-13.79%)
Success 3 Mean 67.40 ( 0.00%) 74.00 ( -9.79%) 58.20 ( 13.65%) 68.80 ( -2.08%)
Success 3 Max 74.00 ( 0.00%) 78.00 ( -5.41%) 70.00 ( 5.41%) 72.00 ( 2.70%)

4.4-rc2 4.4-rc2 4.4-rc2 4.4-rc2
0-test 1-test 2-test 3-test
User 3167.23 3140.58 3198.77 3049.85
System 1166.65 1158.64 1171.06 1140.18
Elapsed 1827.63 1737.69 1750.62 1793.82

4.4-rc2 4.4-rc2 4.4-rc2 4.4-rc2
0-test 1-test 2-test 3-test
Minor Faults 107184766 107311664 107366319 108425875
Major Faults 753 730 746 817
Swap Ins 188 346 243 287
Swap Outs 7278 6186 6226 5702
Allocation stalls 988 868 1104 846
DMA allocs 25 18 15 13
DMA32 allocs 75074785 75104070 75131502 76260816
Normal allocs 26112454 26193770 26142374 26291337
Movable allocs 0 0 0 0
Direct pages scanned 83996 82251 80523 93509
Kswapd pages scanned 2122511 2107947 2110599 2121951
Kswapd pages reclaimed 2031597 2006468 2011184 2052483
Direct pages reclaimed 83806 82162 80315 93275
Kswapd efficiency 95% 95% 95% 96%
Kswapd velocity 1217.211 1202.789 1211.116 1189.075
Direct efficiency 99% 99% 99% 99%
Direct velocity 48.170 46.932 46.206 52.400
Percentage direct scans 3% 3% 3% 4%
Zone normal velocity 301.196 301.273 297.286 308.598
Zone dma32 velocity 964.185 948.448 960.036 932.877
Zone dma velocity 0.000 0.000 0.000 0.000
Page writes by reclaim 7296.200 6187.400 6226.800 5702.600
Page writes file 18 1 0 0
Page writes anon 7278 6186 6226 5702
Page reclaim immediate 259 225 41 180
Sector Reads 4132945 4074422 4099737 4291996
Sector Writes 11066128 11057103 11066448 11083256
Page rescued immediate 0 0 0 0
Slabs scanned 1539471 1521153 1518145 1776426
Direct inode steals 8482 3717 6096 9832
Kswapd inode steals 37735 42700 39976 43492
Kswapd skipped wait 0 0 0 0
THP fault alloc 593 610 680 778
THP collapse alloc 340 294 335 393
THP splits 4 2 4 3
THP fault fallback 751 748 705 626
THP collapse fail 14 16 14 12
Compaction stalls 6464 6373 6743 6451
Compaction success 518 688 575 972
Compaction failures 5945 5684 6167 5479
Page migrate success 318176 313488 239637 595224
Page migrate failure 40983 46106 12171 2587
Compaction pages isolated 733684 735737 564719 713799
Compaction migrate scanned 1101427 1056870 603977 969346
Compaction free scanned 17736383 15328486 11999748 5269641
Compaction cost 352 347 263 638
NUMA alloc hit 99632716 99690283 99753018 100771746
NUMA alloc miss 0 0 0 0
NUMA interleave hit 0 0 0 0
NUMA alloc local 99632716 99690283 99753018 100771746
NUMA base PTE updates 0 0 0 0
NUMA huge PMD updates 0 0 0 0
NUMA page range updates 0 0 0 0
NUMA hint faults 0 0 0 0
NUMA hint local faults 0 0 0 0
NUMA hint local percent 100 100 100 100
NUMA pages migrated 0 0 0 0
AutoNUMA cost 0% 0% 0% 0%

Migrate scanned pages are reduced by patch 2 as expected thanks to the skipping.
Patch 3 reduces free scanned pages significantly, and improves compaction
success and THP fault allocs (of the interfering activity, not the alloc test
itself). That results in more migrate scanner activity, as more success means
less deferring, and time spent previously in free sacanner can now be used in
migration scanner.

"Success 3" is indication of long-term fragmentation (the interference is
ceased in this phase) and it looks quite unstable overall (there shouldn't be
such difference between base and patch 1) but it doesn't seem decreased. I'm
suspecting it's the lack of reset_isolation_suitable() when the only activity
is async compaction. Needs more evaluation.

Aaron, could you try this on your testcase?

[1] https://lkml.org/lkml/2014/7/16/988
[2] http://www.spinics.net/lists/linux-mm/msg97378.html
[3] http://www.spinics.net/lists/linux-mm/msg97475.html

Vlastimil Babka (3):
mm, compaction: reduce spurious pcplist drains
mm, compaction: make async direct compaction skip blocks where
isolation fails
mm, compaction: direct freepage allocation for async direct compaction

include/linux/vm_event_item.h | 1 +
mm/compaction.c | 122 +++++++++++++++++++++++++++++++++++-------
mm/internal.h | 4 ++
mm/page_alloc.c | 27 ++++++++++
mm/vmstat.c | 2 +
5 files changed, 137 insertions(+), 19 deletions(-)

--
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/