Re: [PATCH v3 0/4] variable-order, large folios for anonymous memory

From: Zi Yan
Date: Mon Jul 24 2023 - 10:58:18 EST


On 24 Jul 2023, at 7:59, Ryan Roberts wrote:

> On 14/07/2023 17:04, Ryan Roberts wrote:
>> Hi All,
>>
>> This is v3 of a series to implement variable order, large folios for anonymous
>> memory. (currently called "FLEXIBLE_THP") The objective of this is to improve
>> performance by allocating larger chunks of memory during anonymous page faults.
>> See [1] and [2] for background.
>
> A question for anyone that can help; I'm preparing v4 and as part of that am
> running the mm selftests, now that I've fixed them up to run reliably for
> arm64. This is showing 2 regressions vs the v6.5-rc3 baseline:
>
> 1) khugepaged test fails here:
> # Run test: collapse_max_ptes_none (khugepaged:anon)
> # Maybe collapse with max_ptes_none exceeded.... Fail
> # Unexpected huge page
>
> 2) split_huge_page_test fails with:
> # Still AnonHugePages not split
>
> I *think* (but haven't yet verified) that (1) is due to khugepaged ignoring
> non-order-0 folios when looking for candidates to collapse. Now that we have
> large anon folios, the memory allocated by the test is in large folios and
> therefore does not get collapsed. We understand this issue, and I believe
> DavidH's new scheme for determining exclusive vs shared should give us the tools
> to solve this.
>
> But (2) is weird. If I run this test on its own immediately after booting, it
> passes. If I then run the khugepaged test, then re-run this test, it fails.
>
> The test is allocating 4 hugepages, then requesting they are split using the
> debugfs interface. Then the test looks at /proc/self/smaps to check that
> AnonHugePages is back to 0.
>
> In both the passing and failing cases, the kernel thinks that it has
> successfully split the pages; the debug logs in split_huge_pages_pid() confirm
> this. In the failing case, I wonder if somehow khugepaged could be immediately
> re-collapsing the pages before user sapce can observe the split? Perhaps the
> failed khugepaged test has left khugepaged in an "awake" state and it
> immediately pounces?

This is more likely to be a stats issue. Have you checked smap to see if
AnonHugePages is 0 KB by placing a getchar() before the exit(EXIT_FAILURE)?
Since split_huge_page_test checks that stats to make sure the split indeed
happened.

--
Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature