Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive

From: Vlastimil Babka
Date: Thu Jun 30 2016 - 03:42:48 EST


On 06/30/2016 09:31 AM, Joonsoo Kim wrote:
On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote:
On Wed, 29 Jun 2016, Vlastimil Babka wrote:

On 06/29/2016 03:39 AM, David Rientjes wrote:
It's possible that the freeing scanner can be consistently expensive if
memory is well compacted toward the end of the zone with few free pages
available in that area.

If all zone memory is synchronously compacted, say with
/proc/sys/vm/compact_memory, and thp is faulted, it is possible to
iterate a massive amount of memory even with the per-zone cached free
position.

For example, after compacting all memory and faulting thp for heap, it
was observed that compact_free_scanned increased as much as 892518911 4KB
pages while compact_stall only increased by 171. The freeing scanner
iterated ~20GB of memory for each compaction stall.

To address this, if too much memory is spanned on the freeing scanner's
freelist when releasing back to the system, return the low pfn rather than
the high pfn. It's declared that the freeing scanner will become too
expensive if the high pfn is used, so use the low pfn instead.

The amount of memory declared as too expensive to iterate is subjectively
chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
pages.

Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>

Hmm, I don't know. Seems it only works around one corner case of a larger
issue. The cost for the scanning was already paid, the patch prevents it from
being paid again, but only until the scanners are reset.


The only point of the per-zone cached pfn positions is to avoid doing the
same work again unnecessarily. Having the last 16GB of memory at the end
of a zone being completely unfree is the same as a single page in the last
pageblock free. The number of PageBuddy pages in that amount of memory
can be irrelevant up to COMPACT_CLUSTER_MAX. We simply can't afford to
scan 16GB of memory looking for free pages.

We need to find a root cause of this problem, first.

I guess that this problem would happen when isolate_freepages_block()
early stop due to watermark check (if your patch is applied to your
kernel). If scanner meets, cached pfn will be reset and your patch
doesn't have any effect. So, I guess that scanner doesn't meet.

We enter the compaction with enough free memory so stop in
isolate_freepages_block() should be unlikely event but your number
shows that it happens frequently?

If it's THP faults, it could be also due to need_resched() or lock contention?

Maybe, if we change all watermark check on compaction.c to use
min_wmark, problem would be disappeared.

Basically patches 13 and 16 in https://lkml.org/lkml/2016/6/24/222

Anyway, could you check how often isolate_freepages_block() is stopped
and why?

In addition, I worry that your previous patch that makes
isolate_freepages_block() stop when watermark doesn't meet would cause
compaction non-progress. Amount of free memory can be flutuated so
watermark fail would be temporaral. We need to break compaction in
this case? It would decrease compaction success rate if there is a
memory hogger in parallel. Any idea?

I think it's better to stop and possibly switch to reclaim (or give up for THP's) than to continue hoping that somebody would free the memory for us. As I explained in the other thread, even if we removed watermark check completely and migration succeeded and formed high-order page, compact_finished() would see failed high-order watermark and return COMPACT_CONTINUE, even if the problem is actually order-0 watermarks. So maybe success rate would be bigger, but at enormous cost. IIRC you even proposed once to add order-0 check (maybe even with some gap like compaction_suitable()?) to compact_finished() that would terminate compaction. Which shouldn't be necessary if we terminate due to split_free_page() failing.

Thanks.


Note also that THP's no longer do direct compaction by default in recent
kernels.

To fully solve the freepage scanning issue, we should probably pick and finish
one of the proposed reworks from Joonsoo or myself, or the approach that
replaces free scanner with direct freelist allocations.


Feel free to post the patches, but I believe this simple change makes
release_freepages() exceedingly better and can better target memory for
the freeing scanner.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>