Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries

From: Ryan Roberts
Date: Sat Jan 20 2024 - 07:04:45 EST


On 14/12/2023 22:34, Yang Shi wrote:
> From: Rik van Riel <riel@xxxxxxxxxxx>
>
> Align larger anonymous memory mappings on THP boundaries by going through
> thp_get_unmapped_area if THPs are enabled for the current process.
>
> With this patch, larger anonymous mappings are now THP aligned. When a
> malloc library allocates a 2MB or larger arena, that arena can now be
> mapped with THPs right from the start, which can result in better TLB hit
> rates and execution time.
>
> Link: https://lkml.kernel.org/r/20220809142457.4751229f@xxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Reviewed-by: Yang Shi <shy828301@xxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: Christopher Lameter <cl@xxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
> This patch was applied to v6.1, but was reverted due to a regression
> report. However it turned out the regression was not due to this patch.
> I ping'ed Andrew to reapply this patch, Andrew may forget it. This
> patch helps promote THP, so I rebased it onto the latest mm-unstable.

Hi Yang,

I'm not sure what regression you are referring to above, but I'm seeing a
performance regression in the virtual_address_range mm selftest on arm64, caused
by this patch (which is now in v6.7).

I see 2 problems when running the test; 1) it takes much longer to execute, and
2) the test fails. Both are related:

The (first part of the) test allocates as many 1GB anonymous blocks as it can in
the low 256TB of address space, passing NULL as the addr hint to mmap. Before
this patch, all allocations were abutted and contained in a single, merged VMA.
However, after this patch, each allocation is in its own VMA, and there is a 2M
gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
because there are so many VMAs to check to find a new 1G gap. 2) It fails once
it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
causes a subsequent calloc() to fail, which causes the test to fail.

Looking at the code, I think the problem is that arm64 selects
ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
len+2M then always aligns to the bottom of the discovered gap. That causes the
2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.

I'm not quite sure what the fix is - perhaps __thp_get_unmapped_area() should be
implemented around vm_unmapped_area(), which can manage the alignment more
intelligently?

But until/unless someone comes along with a fix, I think this patch should be
reverted.

Thanks,
Ryan


>
>
> mm/mmap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 9d780f415be3..dd25a2aa94f7 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2232,6 +2232,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> */
> pgoff = 0;
> get_area = shmem_get_unmapped_area;
> + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> + /* Ensures that larger anonymous mappings are THP aligned. */
> + get_area = thp_get_unmapped_area;
> }
>
> addr = get_area(file, addr, len, pgoff, flags);