Re: [PATCH v2] mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs

From: Mike Kravetz
Date: Fri Oct 21 2022 - 19:56:23 EST


On 10/21/22 19:28, Rik van Riel wrote:
> A common use case for hugetlbfs is for the application to create
> memory pools backed by huge pages, which then get handed over to
> some malloc library (eg. jemalloc) for further management.
>
> That malloc library may be doing MADV_DONTNEED calls on memory
> that is no longer needed, expecting those calls to happen on
> PAGE_SIZE boundaries.
>
> However, currently the MADV_DONTNEED code rounds up any such
> requests to HPAGE_PMD_SIZE boundaries. This leads to undesired
> outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of
> memory get zeroed out, instead.
>
> Use of pre-built shared libraries means that user code does not
> always know the page size of every memory arena in use.
>
> Avoid unexpected data loss with MADV_DONTNEED by rounding up
> only to PAGE_SIZE (in do_madvise), and rounding down to huge
> page granularity.
>
> That way programs will only get as much memory zeroed out as
> they requested.
>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxx
> Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")

I do hate changing behavior, but in hindsight this is the right behavior.
Especially, since it can cause unexpected data loss.

Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
--
Mike Kravetz

> ---
> v2: split out the most urgent fix for stable backports
>
> mm/madvise.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 2baa93ca2310..c7105ec6d08c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -813,7 +813,14 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
> if (start & ~huge_page_mask(hstate_vma(vma)))
> return false;
>
> - *end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
> + /*
> + * Madvise callers expect the length to be rounded up to PAGE_SIZE
> + * boundaries, and may be unaware that this VMA uses huge pages.
> + * Avoid unexpected data loss by rounding down the number of
> + * huge pages freed.
> + */
> + *end = ALIGN_DOWN(*end, huge_page_size(hstate_vma(vma)));
> +
> return true;
> }
>
> @@ -828,6 +835,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
> return -EINVAL;
>
> + if (start == end)
> + return 0;
> +
> if (!userfaultfd_remove(vma, start, end)) {
> *prev = NULL; /* mmap_lock has been dropped, prev is stale */
>
> --
> 2.37.2
>
>