Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due toinsufficient progress if !__GFP_REPEAT

From: Minchan Kim
Date: Wed Feb 16 2011 - 18:26:26 EST

Next message: Tejun Heo: "Re: [Patch v2] block: revert block_dev read-only check"
Previous message: Rainer Weikusat: "[PATCH] net: fix multithreaded signal handling in unix recv routines"
In reply to: Johannes Weiner: "Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due toinsufficient progress if !__GFP_REPEAT"
Next in thread: Andrew Morton: "Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due toinsufficient progress if !__GFP_REPEAT"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Feb 16, 2011 at 6:50 PM, Mel Gorman <mel@xxxxxxxxx> wrote:
> should_continue_reclaim() for reclaim/compaction allows scanning to continue
> even if pages are not being reclaimed until the full list is scanned. In
> terms of allocation success, this makes sense but potentially it introduces
> unwanted latency for high-order allocations such as transparent hugepages
> and network jumbo frames that would prefer to fail the allocation attempt
> and fallback to order-0 pages. ÂWorse, there is a potential that the full
> LRU scan will clear all the young bits, distort page aging information and
> potentially push pages into swap that would have otherwise remained resident.
>
> This patch will stop reclaim/compaction if no pages were reclaimed in the
> last SWAP_CLUSTER_MAX pages that were considered. For allocations such as
> hugetlbfs that use GFP_REPEAT and have fewer fallback options, the full LRU
> list may still be scanned.
>
> To test this, a tool was developed based on ftrace that tracked the latency of
> high-order allocations while transparent hugepage support was enabled and three
> benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's
> patch "vmscan: fix zone shrinking exit when scan work is done" applied.
>
> STREAM Highorder Allocation Latency Statistics
> Â Â Â Â Â Â Â fix-infinite Â Â break-early
> 1 :: Count Â Â Â Â Â Â10298 Â Â Â Â Â 10229
> 1 :: Min Â Â Â Â Â Â 0.4560 Â Â Â Â Â0.4640
> 1 :: Mean Â Â Â Â Â Â1.0589 Â Â Â Â Â1.0183
> 1 :: Max Â Â Â Â Â Â14.5990 Â Â Â Â 11.7510
> 1 :: Stddev Â Â Â Â Â0.5208 Â Â Â Â Â0.4719
> 2 :: Count Â Â Â Â Â Â Â Â2 Â Â Â Â Â Â Â 1
> 2 :: Min Â Â Â Â Â Â 1.8610 Â Â Â Â Â3.7240
> 2 :: Mean Â Â Â Â Â Â3.4325 Â Â Â Â Â3.7240
> 2 :: Max Â Â Â Â Â Â 5.0040 Â Â Â Â Â3.7240
> 2 :: Stddev Â Â Â Â Â1.5715 Â Â Â Â Â0.0000
> 9 :: Count Â Â Â Â Â 111696 Â Â Â Â Â111694
> 9 :: Min Â Â Â Â Â Â 0.5230 Â Â Â Â Â0.4110
> 9 :: Mean Â Â Â Â Â 10.5831 Â Â Â Â 10.5718
> 9 :: Max Â Â Â Â Â Â38.4480 Â Â Â Â 43.2900
> 9 :: Stddev Â Â Â Â Â1.1147 Â Â Â Â Â1.1325
>
> Mean time for order-1 allocations is reduced. order-2 looks increased
> but with so few allocations, it's not particularly significant. THP mean
> allocation latency is also reduced. That said, allocation time varies so
> significantly that the reductions are within noise.
>
> Max allocation time is reduced by a significant amount for low-order
> allocations but reduced for THP allocations which presumably are now
> breaking before reclaim has done enough work.
>
> SysBench Highorder Allocation Latency Statistics
> Â Â Â Â Â Â Â fix-infinite Â Â break-early
> 1 :: Count Â Â Â Â Â Â15745 Â Â Â Â Â 15677
> 1 :: Min Â Â Â Â Â Â 0.4250 Â Â Â Â Â0.4550
> 1 :: Mean Â Â Â Â Â Â1.1023 Â Â Â Â Â1.0810
> 1 :: Max Â Â Â Â Â Â14.4590 Â Â Â Â 10.8220
> 1 :: Stddev Â Â Â Â Â0.5117 Â Â Â Â Â0.5100
> 2 :: Count Â Â Â Â Â Â Â Â1 Â Â Â Â Â Â Â 1
> 2 :: Min Â Â Â Â Â Â 3.0040 Â Â Â Â Â2.1530
> 2 :: Mean Â Â Â Â Â Â3.0040 Â Â Â Â Â2.1530
> 2 :: Max Â Â Â Â Â Â 3.0040 Â Â Â Â Â2.1530
> 2 :: Stddev Â Â Â Â Â0.0000 Â Â Â Â Â0.0000
> 9 :: Count Â Â Â Â Â Â 2017 Â Â Â Â Â Â1931
> 9 :: Min Â Â Â Â Â Â 0.4980 Â Â Â Â Â0.7480
> 9 :: Mean Â Â Â Â Â 10.4717 Â Â Â Â 10.3840
> 9 :: Max Â Â Â Â Â Â24.9460 Â Â Â Â 26.2500
> 9 :: Stddev Â Â Â Â Â1.1726 Â Â Â Â Â1.1966
>
> Again, mean time for order-1 allocations is reduced while order-2 allocations
> are too few to draw conclusions from. The mean time for THP allocations is
> also slightly reduced albeit the reductions are within varianes.
>
> Once again, our maximum allocation time is significantly reduced for
> low-order allocations and slightly increased for THP allocations.
>
> Anon stream mmap reference Highorder Allocation Latency Statistics
> 1 :: Count Â Â Â Â Â Â 1376 Â Â Â Â Â Â1790
> 1 :: Min Â Â Â Â Â Â 0.4940 Â Â Â Â Â0.5010
> 1 :: Mean Â Â Â Â Â Â1.0289 Â Â Â Â Â0.9732
> 1 :: Max Â Â Â Â Â Â 6.2670 Â Â Â Â Â4.2540
> 1 :: Stddev Â Â Â Â Â0.4142 Â Â Â Â Â0.2785
> 2 :: Count Â Â Â Â Â Â Â Â1 Â Â Â Â Â Â Â -
> 2 :: Min Â Â Â Â Â Â 1.9060 Â Â Â Â Â Â Â -
> 2 :: Mean Â Â Â Â Â Â1.9060 Â Â Â Â Â Â Â -
> 2 :: Max Â Â Â Â Â Â 1.9060 Â Â Â Â Â Â Â -
> 2 :: Stddev Â Â Â Â Â0.0000 Â Â Â Â Â Â Â -
> 9 :: Count Â Â Â Â Â Â11266 Â Â Â Â Â 11257
> 9 :: Min Â Â Â Â Â Â 0.4990 Â Â Â Â Â0.4940
> 9 :: Mean Â Â Â Â27250.4669 Â Â Â24256.1919
> 9 :: Max Â Â Â11439211.0000 Â Â6008885.0000
> 9 :: Stddev Â Â 226427.4624 Â Â 186298.1430
>
> This benchmark creates one thread per CPU which references an amount of
> anonymous memory 1.5 times the size of physical RAM. This pounds swap quite
> heavily and is intended to exercise THP a bit.
>
> Mean allocation time for order-1 is reduced as before. It's also reduced
> for THP allocations but the variations here are pretty massive due to swap.
> As before, maximum allocation times are significantly reduced.
>
> Overall, the patch reduces the mean and maximum allocation latencies for
> the smaller high-order allocations. This was with Slab configured so it
> would be expected to be more significant with Slub which uses these size
> allocations more aggressively.
>
> The mean allocation times for THP allocations are also slightly reduced.
> The maximum latency was slightly increased as predicted by the comments due
> to reclaim/compaction breaking early. However, workloads care more about the
> latency of lower-order allocations than THP so it's an acceptable trade-off.
> Please consider merging for 2.6.38.
>
> Signed-off-by: Mel Gorman <mel@xxxxxxxxx>

> ---
> Âmm/vmscan.c | Â 32 ++++++++++++++++++++++----------
> Â1 files changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 148c6e6..591b907 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1841,16 +1841,28 @@ static inline bool should_continue_reclaim(struct zone *zone,
> Â Â Â Âif (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
> Â Â Â Â Â Â Â Âreturn false;
>
> - Â Â Â /*
> - Â Â Â Â* If we failed to reclaim and have scanned the full list, stop.
> - Â Â Â Â* NOTE: Checking just nr_reclaimed would exit reclaim/compaction far
> - Â Â Â Â* Â Â Â faster but obviously would be less likely to succeed
> - Â Â Â Â* Â Â Â allocation. If this is desirable, use GFP_REPEAT to decide

Typo. __GFP_REPEAT

Otherwise, looks good to me.
Reviewed-by: Minchan Kim <minchan.kim@xxxxxxxxx>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tejun Heo: "Re: [Patch v2] block: revert block_dev read-only check"
Previous message: Rainer Weikusat: "[PATCH] net: fix multithreaded signal handling in unix recv routines"
In reply to: Johannes Weiner: "Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due toinsufficient progress if !__GFP_REPEAT"
Next in thread: Andrew Morton: "Re: [PATCH] mm: vmscan: Stop reclaim/compaction earlier due toinsufficient progress if !__GFP_REPEAT"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]