Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

From: Johannes Weiner
Date: Mon Oct 16 2023 - 14:51:22 EST


On Mon, Oct 16, 2023 at 11:00:33AM -0400, Zi Yan wrote:
> On 16 Oct 2023, at 10:37, Johannes Weiner wrote:
>
> > On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote:
> >>> The attached patch has all the suggested changes, let me know how it
> >>> looks to you. Thanks.
> >>
> >> The one I sent has free page accounting issues. The attached one fixes them.
> >
> > Do you still have the warnings? I wonder what went wrong.
>
> No warnings. But something with the code:
>
> 1. in your version, split_free_page() is called without changing any pageblock
> migratetypes, then split_free_page() is just a no-op, since the page is
> just deleted from the free list, then freed via different orders. Buddy allocator
> will merge them back.

Hm not quite.

If it's the tail block of a buddy, I update its type before
splitting. The splitting loop looks up the type of each block for
sorting it onto freelists.

If it's the head block, yes I split it first according to its old
type. But then I let it fall through to scanning the block, which will
find that buddy, update its type and move it.

> 2. in my version, I set pageblock migratetype to new_mt before split_free_page(),
> but it causes free page accounting issues, since in the case of head, free pages
> are deleted from new_mt when they are in old_mt free list and the accounting
> decreases new_mt free page number instead of old_mt one.

Right, that makes sense.

> Basically, split_free_page() is awkward as it relies on preset migratetypes,
> which changes migratetypes without deleting the free pages from the list first.
> That is why I came up with the new split_free_page() below.

Yeah, the in-between thing is bad. Either it fixes the migratetype
before deletion, or it doesn't do the deletion. I'm thinking it would
be simpler to move the deletion out instead.

> >> @@ -883,6 +886,10 @@ int split_free_page(struct page *free_page,
> >> mt = get_pfnblock_migratetype(free_page, free_page_pfn);
> >> del_page_from_free_list(free_page, zone, order, mt);
> >>
> >> + set_pageblock_migratetype(free_page, mt1);
> >> + set_pageblock_migratetype(pfn_to_page(free_page_pfn + split_pfn_offset),
> >> + mt2);
> >> +
> >> for (pfn = free_page_pfn;
> >> pfn < free_page_pfn + (1UL << order);) {
> >> int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> >
> > I don't think this is quite right.
> >
> > With CONFIG_ARCH_FORCE_MAX_ORDER it's possible that we're dealing with
> > a buddy that is more than two blocks:
> >
> > [pageblock 0][pageblock 1][pageblock 2][pageblock 3]
> > [buddy ]
> > [isolate range ..
> >
> > That for loop splits the buddy into 4 blocks. The code above would set
> > pageblock 0 to old_mt, and pageblock 1 to new_mt. But it should only
> > set pageblock 3 to new_mt.
>
> OK. I think I need to fix split_free_page().
>
> Hmm, if CONFIG_ARCH_FORCE_MAX_ORDER can make a buddy have more than one
> pageblock and in turn makes an in-use page have more than one pageblock,
> we will have problems. Since in isolate_single_pageblock(), an in-use page
> can have part of its pageblock set to a different migratetype and be freed,
> causing the free page with unmatched migratetypes. We might need to
> free pages at pageblock_order if their orders are bigger than pageblock_order.

Is this a practical issue? You mentioned that right now only gigantic
pages can be larger than a pageblock, and those are freed in order-0
chunks.

> > How about pulling the freelist removal out of split_free_page()?
> >
> > del_page_from_freelist(huge_buddy);
> > set_pageblock_migratetype(start_page, MIGRATE_ISOLATE);
> > split_free_page(huge_buddy, buddy_order(), pageblock_nr_pages);
> > return pageblock_nr_pages;
>
> Yes, this is better. Let me change to this implementation.
>
> But I would like to test it on an environment where a buddy contains more than
> one pageblocks first. I probably can change MAX_ORDER of x86_64 to do it locally.
> I will report back.

I tweaked my version some more based on our discussion. Would you mind
taking a look? It survived an hour of stressing with a kernel build
and Mike's reproducer that allocates gigantics and demotes them.

Note that it applies *before* consolidating of the free counts, as
isolation needs to be fixed before the warnings are added, to avoid
bisectability issues. The consolidation patch doesn't change it much,
except removing freepage accounting in move_freepages_block_isolate().

---