Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries

From: Yang Shi
Date: Tue Jan 23 2024 - 12:26:58 EST


On Tue, Jan 23, 2024 at 9:14 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >
> > On 22/01/2024 19:43, Yang Shi wrote:
> > > On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> > >>
> > >> On 20/01/2024 16:39, Matthew Wilcox wrote:
> > >>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> > >>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
> > >>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> > >>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> > >>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> > >>>> causes a subsequent calloc() to fail, which causes the test to fail.
> > >>>>
> > >>>> Looking at the code, I think the problem is that arm64 selects
> > >>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> > >>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
> > >>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> > >>>
> > >>> As a quick hack, perhaps
> > >>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> > >>> take-the-top-half
> > >>> #else
> > >>> current-take-bottom-half-code
> > >>> #endif
> > >>>
> > >>> ?
> > >
> > > Thanks for the suggestion. It makes sense to me. Doing the alignment
> > > needs to take into account this.
> > >
> > >>
> > >> There is a general problem though that there is a trade-off between abutting
> > >> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> > >> general the latter is preferable. The case I'm hitting is special though, in
> > >> that both requirements could be achieved but currently are not.
> > >>
> > >> The below fixes it, but I feel like there should be some bitwise magic that
> > >> would give the correct answer without the conditional - but my head is gone and
> > >> I can't see it. Any thoughts?
> > >
> > > Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
> > > the conditional either.
> > >
> > >>
> > >> Beyond this, though, there is also a latent bug where the offset provided to
> > >> mmap() is carried all the way through to the get_unmapped_area()
> > >> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> > >> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> > >> that use the default get_unmapped_area(), any non-zero offset would not have
> > >> been used. But this change starts using it, which is incorrect. That said, there
> > >> are some arches that override the default get_unmapped_area() and do use the
> > >> offset. So I'm not sure if this is a bug or a feature that user space can pass
> > >> an arbitrary value to the implementation for anon memory??
> > >
> > > Thanks for noticing this. If I read the code correctly, the pgoff used
> > > by some arches to workaround VIPT caches, and it looks like it is for
> > > shared mapping only (just checked arm and mips). And I believe
> > > everybody assumes 0 should be used when doing anonymous mapping. The
> > > offset should have nothing to do with seeking proper unmapped virtual
> > > area. But the pgoff does make sense for file THP due to the alignment
> > > requirements. I think it should be zero'ed for anonymous mappings,
> > > like:
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 2ff79b1d1564..a9ed353ce627 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
> > > long addr, unsigned long len,
> > > pgoff = 0;
> > > get_area = shmem_get_unmapped_area;
> > > } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > + pgoff = 0;
> > > /* Ensures that larger anonymous mappings are THP aligned. */
> > > get_area = thp_get_unmapped_area;
> > > }
> >
> > I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
> > shared case, the THP case, and the non-THP case properly. I'll prepare a
> > separate patch for this.
>
> IIUC I don't think this is ok for those arches which have to
> workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
> pointer is a common case for creating tmpfs mapping. For example,
> arm's arch_get_unmapped_area() has:
>
> if (aliasing)
> do_align = filp || (flags & MAP_SHARED);
>
> The pgoff is needed if do_align is true. So we should just zero pgoff
> iff !file && !MAP_SHARED like what my patch does, we can move the
> zeroing to a better place.

Rethinking this... zeroing pgoff when file is NULL should be ok since
MAP_ANOYMOUS | MAP_SHARED mapping should typically have zero offset.
I'm not aware of any usecase with non-zero offset, or sane usecase...

>
> >
> >
> > >
> > >>
> > >> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> > >> bug in the test code, but provoked by this change. So I'll send out a fix for
> > >> the test code separately.
> > >>
> > >>
> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > >> index 4f542444a91f..68ac54117c77 100644
> > >> --- a/mm/huge_memory.c
> > >> +++ b/mm/huge_memory.c
> > >> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> > >> {
> > >> loff_t off_end = off + len;
> > >> loff_t off_align = round_up(off, size);
> > >> - unsigned long len_pad, ret;
> > >> + unsigned long len_pad, ret, off_sub;
> > >>
> > >> if (off_end <= off_align || (off_end - off_align) < size)
> > >> return 0;
> > >> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> > >> if (ret == addr)
> > >> return addr;
> > >>
> > >> - ret += (off - ret) & (size - 1);
> > >> + off_sub = (off - ret) & (size - 1);
> > >> +
> > >> + if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> > >> + !off_sub)
> > >> + return ret + size;
> > >> +
> > >> + ret += off_sub;
> > >> return ret;
> > >> }
> > >
> > > I didn't spot any problem, would you please come up with a formal patch?
> >
> > Yeah, I'll aim to post today.
>
> Thanks!
>
> >
> >