Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries

From: Yang Shi
Date: Mon Jan 22 2024 - 15:20:46 EST


On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>
> On 20/01/2024 16:39, Matthew Wilcox wrote:
> > On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> >> However, after this patch, each allocation is in its own VMA, and there is a 2M
> >> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> >> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> >> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> >> causes a subsequent calloc() to fail, which causes the test to fail.
> >>
> >> Looking at the code, I think the problem is that arm64 selects
> >> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> >> len+2M then always aligns to the bottom of the discovered gap. That causes the
> >> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> >
> > As a quick hack, perhaps
> > #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> > take-the-top-half
> > #else
> > current-take-bottom-half-code
> > #endif
> >
> > ?
>
> There is a general problem though that there is a trade-off between abutting
> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> general the latter is preferable. The case I'm hitting is special though, in
> that both requirements could be achieved but currently are not.
>
> The below fixes it, but I feel like there should be some bitwise magic that
> would give the correct answer without the conditional - but my head is gone and
> I can't see it. Any thoughts?
>
> Beyond this, though, there is also a latent bug where the offset provided to
> mmap() is carried all the way through to the get_unmapped_area()
> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> that use the default get_unmapped_area(), any non-zero offset would not have
> been used. But this change starts using it, which is incorrect. That said, there
> are some arches that override the default get_unmapped_area() and do use the
> offset. So I'm not sure if this is a bug or a feature that user space can pass
> an arbitrary value to the implementation for anon memory??
>
> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> bug in the test code, but provoked by this change. So I'll send out a fix for
> the test code separately.

Thanks for figuring this out.

>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4f542444a91f..68ac54117c77 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> {
> loff_t off_end = off + len;
> loff_t off_align = round_up(off, size);
> - unsigned long len_pad, ret;
> + unsigned long len_pad, ret, off_sub;
>
> if (off_end <= off_align || (off_end - off_align) < size)
> return 0;
> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> if (ret == addr)
> return addr;
>
> - ret += (off - ret) & (size - 1);
> + off_sub = (off - ret) & (size - 1);
> +
> + if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> + !off_sub)
> + return ret + size;
> +
> + ret += off_sub;
> return ret;
> }