Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

From: Zi Yan
Date: Wed Sep 20 2023 - 22:31:41 EST


On 20 Sep 2023, at 13:23, Zi Yan wrote:

> On 20 Sep 2023, at 12:04, Johannes Weiner wrote:
>
>> On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote:
>>> On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote:
>>>> On 9/20/23 03:38, Zi Yan wrote:
>>>>> On 19 Sep 2023, at 20:32, Mike Kravetz wrote:
>>>>>
>>>>>> On 09/19/23 16:57, Zi Yan wrote:
>>>>>>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote:
>>>>>>>
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
>>>>>>>> end = pageblock_end_pfn(pfn) - 1;
>>>>>>>>
>>>>>>>> /* Do not cross zone boundaries */
>>>>>>>> +#if 0
>>>>>>>> if (!zone_spans_pfn(zone, start))
>>>>>>>> start = zone->zone_start_pfn;
>>>>>>>> +#else
>>>>>>>> + if (!zone_spans_pfn(zone, start))
>>>>>>>> + start = pfn;
>>>>>>>> +#endif
>>>>>>>> if (!zone_spans_pfn(zone, end))
>>>>>>>> return false;
>>>>>>>> I can still trigger warnings.
>>>>>>>
>>>>>>> OK. One thing to note is that the page type in the warning changed from
>>>>>>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change.
>>>>>>>
>>>>>>
>>>>>> Just to be really clear,
>>>>>> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path.
>>>>>> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call
>>>>>> path WITHOUT your change.
>>>>>>
>>>>>> I am guessing the difference here has more to do with the allocation path?
>>>>>>
>>>>>> I went back and reran focusing on the specific migrate type.
>>>>>> Without your patch, and coming from the alloc_contig_range call path,
>>>>>> I got two warnings of 'page type is 0, passed migratetype is 1' as above.
>>>>>> With your patch I got one 'page type is 0, passed migratetype is 1'
>>>>>> warning and one 'page type is 1, passed migratetype is 0' warning.
>>>>>>
>>>>>> I could be wrong, but I do not think your patch changes things.
>>>>>
>>>>> Got it. Thanks for the clarification.
>>>>>>
>>>>>>>>
>>>>>>>> One idea about recreating the issue is that it may have to do with size
>>>>>>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried
>>>>>>>> to really stress the allocations by increasing the number of hugetlb
>>>>>>>> pages requested and that did not help. I also noticed that I only seem
>>>>>>>> to get two warnings and then they stop, even if I continue to run the
>>>>>>>> script.
>>>>>>>>
>>>>>>>> Zi asked about my config, so it is attached.
>>>>>>>
>>>>>>> With your config, I still have no luck reproducing the issue. I will keep
>>>>>>> trying. Thanks.
>>>>>>>
>>>>>>
>>>>>> Perhaps try running both scripts in parallel?
>>>>>
>>>>> Yes. It seems to do the trick.
>>>>>
>>>>>> Adjust the number of hugetlb pages allocated to equal 25% of memory?
>>>>>
>>>>> I am able to reproduce it with the script below:
>>>>>
>>>>> while true; do
>>>>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages&
>>>>> echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages&
>>>>> wait
>>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>>>> done
>>>>>
>>>>> I will look into the issue.
>>>
>>> Nice!
>>>
>>> I managed to reproduce it ONCE, triggering it not even a second after
>>> starting the script. But I can't seem to do it twice, even after
>>> several reboots and letting it run for minutes.
>>
>> I managed to reproduce it reliably by cutting the nr_hugepages
>> parameters respectively in half.
>>
>> The one that triggers for me is always MIGRATE_ISOLATE. With some
>> printk-tracing, the scenario seems to be this:
>>
>> #0 #1
>> start_isolate_page_range()
>> isolate_single_pageblock()
>> set_migratetype_isolate(tail)
>> lock zone->lock
>> move_freepages_block(tail) // nop
>> set_pageblock_migratetype(tail)
>> unlock zone->lock
>> del_page_from_freelist(head)
>> expand(head, head_mt)
>> WARN(head_mt != tail_mt)
>> start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES)
>> for (pfn = start_pfn, pfn < end_pfn)
>> if (PageBuddy())
>> split_free_page(head)
>>
>> IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the
>> lock. The move_freepages_block() does nothing because the PageBuddy()
>> is set on the pageblock to the left. Once we drop the lock, the buddy
>> gets allocated and the expand() puts things on the wrong list. The
>> splitting code that handles MAX_ORDER blocks runs *after* the tail
>> type is set and the lock has been dropped, so it's too late.
>
> Yes, this is the issue I can confirm as well. But it is intentional to enable
> allocating a contiguous range at pageblock granularity instead of MAX_ORDER
> granularity. With your changes below, it no longer works, because if there
> is an unmovable page in
> [ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)),
> the allocation fails but it would succeed in current implementation.
>
> I think a proper fix would be to make move_freepages_block() split the
> MAX_ORDER page and put the split pages in the right migratetype free lists.
>
> I am working on that.

After spending half a day on this, I think it is much harder than I thought
to get alloc_contig_range() working with the freelist migratetype hygiene
patchset. Because alloc_contig_range() relies on racy migratetype changes:

1. pageblocks in the range are first marked as MIGRATE_ISOLATE to prevent
another parallel isolation, but they are not moved to the MIGRATE_ISOLATE
free list yet.

2. later in the process, isolate_freepages_range() is used to actually grab
the free pages.

3. there was no problem when alloc_contig_range() works on MAX_ORDER aligned
ranges, since MIGRATE_ISOLATE cannot be set in the middle of free pages or
in-use pages. But it is not the case when alloc_contig_range() work on
pageblock aligned ranges. Now during isolation phase, free or in-use pages
will need to be split to get their subpages into the right free lists.

4. the hardest case is when a in-use page sits across two pageblocks, currently,
the code just isolate one pageblock, migrate the page, and let split_free_page()
to correct the free list later. But to strictly enforce freelist migratetype
hygiene, extra work is needed at free page path to split the free page into
the right freelists.

I need more time to think about how to get alloc_contig_range() properly.
Help is needed for the bullet point 4.

Thanks.

PS: One observation is that after move_to_free_list(), a page's migratetype
does not match the migratetype of its free list. I might need to make
changes on top of your patchset to get alloc_contig_range() working.


--
Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature