Re: [RFC PATCH 05/26] mm: page_alloc: per-migratetype pcplist for THPs

From: Mel Gorman
Date: Fri Apr 28 2023 - 06:29:49 EST


On Fri, Apr 21, 2023 at 11:06:48AM -0400, Johannes Weiner wrote:
> On Fri, Apr 21, 2023 at 01:47:44PM +0100, Mel Gorman wrote:
> > On Tue, Apr 18, 2023 at 03:12:52PM -0400, Johannes Weiner wrote:
> > > Right now, there is only one pcplist for THP allocations. However,
> > > while most THPs are movable, the huge zero page is not. This means a
> > > movable THP allocation can grab an unmovable block from the pcplist,
> > > and a subsequent THP split, partial free, and reallocation of the
> > > remainder will mix movable and unmovable pages in the block.
> > >
> > > While this isn't a huge source of block pollution in practice, it
> > > happens often enough to trigger debug warnings fairly quickly under
> > > load. In the interest of tightening up pageblock hygiene, make the THP
> > > pcplists fully migratetype-aware, just like the lower order ones.
> > >
> > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> >
> > Split out :P
> >
> > Take special care of this one because, while I didn't check this, I
> > suspect it'll push the PCP structure size into the next cache line and
> > increase overhead.
> >
> > The changelog makes it unclear why exactly this happens or why the
> > patch fixes it.
>
> Before this, I'd see warnings from the last patch in the series about
> received migratetype not matching requested mt.
>
> The way it happens is that the zero page gets freed and the unmovable
> block put on the pcplist. A regular THP allocation is subsequently
> served from an unmovable block.
>
> Mental note, I think this can happen the other way around too: a
> regular THP on the pcp being served to a MIGRATE_UNMOVABLE zero
> THP. It's not supposed to, but it looks like there is a bug in the
> code that's meant to prevent that from happening in rmqueue():
>
> if (likely(pcp_allowed_order(order))) {
> /*
> * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> * we need to skip it when CMA area isn't allowed.
> */
> if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> migratetype != MIGRATE_MOVABLE) {
> page = rmqueue_pcplist(preferred_zone, zone, order,
> migratetype, alloc_flags);
> if (likely(page))
> goto out;
> }
> }
>
> Surely that last condition should be migratetype == MIGRATE_MOVABLE?
>

It should be. It would have been missed for ages because it would need a
test case based on a machine configuration that requires CMA for functional
correctness and is using THP which is an unlikely combination.

> > The huge zero page strips GFP_MOVABLE (so unmovable)
> > but at allocation time, it doesn't really matter what the movable type
> > is because it's a full pageblock. It doesn't appear to be a hazard until
> > the split happens. Assuming that's the case, it should be ok to always
> > set the pageblock movable for THP allocations regardless of GFP flags at
> > allocation time or else set the pageblock MOVABLE at THP split (always
> > MOVABLE at allocation time makes more sense).
>
> The regular allocator compaction skips over compound pages anyway, so
> the migratetype should indeed not matter there.
>
> The bigger issue is CMA. alloc_contig_range() will try to move THPs to
> free a larger range. We have to be careful not to place an unmovable
> zero THP into a CMA region. That means we can not play games with MT -
> we really do have to physically keep unmovable and movable THPs apart.
>

Fair point.

> Another option would be not to use pcp for the zero THP. It's cached
> anyway in the caller. But it would add branches to the THP alloc and
> free fast paths (pcp_allowed_order() also checking migratetype).

And this is probably the most straight-forward option. The intent behind
caching some THPs on PCP was faulting large mappings of normal THPs and
reducing the contention on the zone lock a little. The zero THP is somewhat
special because it should not be allocated at high frequency.

--
Mel Gorman
SUSE Labs