Re: [PATCH] mm, compaction: direct freepage allocation for async direct compaction

From: Andrew Morton
Date: Tue Nov 28 2017 - 18:34:23 EST


On Wed, 22 Nov 2017 15:52:55 +0100 Vlastimil Babka <vbabka@xxxxxxx> wrote:

> On 11/22/2017 03:33 PM, Johannes Weiner wrote:
> > From: Vlastimil Babka <vbabka@xxxxxxx>
> >
> > The goal of direct compaction is to quickly make a high-order page available
> > for the pending allocation. The free page scanner can add significant latency
> > when searching for migration targets, although to succeed the compaction, the
> > only important limit on the target free pages is that they must not come from
> > the same order-aligned block as the migrated pages.
> >
> > This patch therefore makes direct async compaction allocate freepages directly
> > from freelists. Pages that do come from the same block (which we cannot simply
> > exclude from the freelist allocation) are put on separate list and released
> > only after migration to allow them to merge.
> >
> > In addition to reduced stall, another advantage is that we split larger free
> > pages for migration targets only when smaller pages are depleted, while the
> > free scanner can split pages up to (order - 1) as it encouters them. However,
> > this approach likely sacrifices some of the long-term anti-fragmentation
> > features of a thorough compaction, so we limit the direct allocation approach
> > to direct async compaction.
> >
> > For observational purposes, the patch introduces two new counters to
> > /proc/vmstat. compact_free_direct_alloc counts how many pages were allocated
> > directly without scanning, and compact_free_direct_miss counts the subset of
> > these allocations that were from the wrong range and had to be held on the
> > separate list.
> >
> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > ---
> >
> > Hi. I'm resending this because we've been struggling with the cost of
> > compaction in our fleet, and this patch helps substantially.
> >
> > On 128G+ machines, we have seen isolate_freepages_block() eat up 40%
> > of the CPU cycles and scanning up to a billion PFNs per minute. Not in
> > a spike, but continuously, to service higher-order allocations from
> > the network stack, fork (non-vmap stacks), THP, etc. during regular
> > operation.
> >
> > I've been running this patch on a handful of less-affected but still
> > pretty bad machines for a week, and the results look pretty great:
> >
> > http://cmpxchg.org/compactdirectalloc/compactdirectalloc.png
>
> Thanks a lot, that's very encouraging!

Yup.

Should we proceed with this patch for now, or wait for something better
to come along?