Re: [PATCH] mm/hugetlb: revert use of page_cache_next_miss()

From: Mike Kravetz
Date: Fri May 05 2023 - 17:59:03 EST


On 05/05/23 11:53, Sidhartha Kumar wrote:
> As reported by Ackerley[1], the use of page_cache_next_miss() in
> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> same offset fails with -EEXIST. Revert this change and go back to the
> previous method of using get from the page cache and then dropping the
> reference on success.
>
> hugetlbfs_pagecache_present() was also refactored to use
> page_cache_next_miss(), revert the usage there as well.
>
> User visible impacts include hugetlb fallocate incorrectly returning
> EEXIST if pages are already present in the file. In addition, hugetlb
> pages will not be included in core dumps if they need to be brought in via
> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> already present in the cache. It may try to allocate a new page and
> potentially return ENOMEM as opposed to EEXIST.
>
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")

Small nit and a question for people more familiar with stable backports.

d0ce0e47b323 added the usage of page_cache_next_miss to hugetlb fallocate.
91a2fb956ad99 added the usage to hugetlbfs_pagecache_present. Both are
in v6.3 and d0ce0e47b323 (referenced here) comes later. So, I 'think' it
is OK to fix both instances with a single patch and reference the commit
where both are present. Or, should there be two patches which is more
technically correct?

> Cc: <stable@xxxxxxxxxxxxxxx> #v6.3+
> Reported-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
>
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@xxxxxxxxxx/
> ---
> This patch is meant to fix stable v6.3.1 as safe as possible by doing a
> simple revert.
>
> Patch page cache: fix page_cache_next/prev_miss off by one by Mike is a
> potential fix that will allow the use of page_cache_next_miss() and is
> awaiting review.
>
> Patch Fix fallocate error in hugetlbfs when fallocating again by Ackerley
> is another fix but introduces a new function and is also awaiting review.
>
> fs/hugetlbfs/inode.c | 8 +++-----
> mm/hugetlb.c | 11 +++++------
> 2 files changed, 8 insertions(+), 11 deletions(-)

IMO, this is safest and simplest way of fixing v6.3. My proposed changes to
page_cache_next/prev_miss have the potential to impact readahead, so really
need review/testing by someone more familiar with that. If a fix is
urgently needed, I would suggest using this for backport and then either
use my patch or expand Ackerley's proposal to move forward.

As a backport to stable,
Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
--
Mike Kravetz

>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9062da6da5675..6d6cd8f26d76d 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -821,7 +821,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> */
> struct folio *folio;
> unsigned long addr;
> - bool present;
>
> cond_resched();
>
> @@ -845,10 +844,9 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>
> /* See if already present in mapping to avoid alloc/free */
> - rcu_read_lock();
> - present = page_cache_next_miss(mapping, index, 1) != index;
> - rcu_read_unlock();
> - if (present) {
> + folio = filemap_get_folio(mapping, index);
> + if (folio) {
> + folio_put(folio);
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> hugetlb_drop_vma_policy(&pseudo_vma);
> continue;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 245038a9fe4ea..29ab27d2a3ef5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5666,13 +5666,12 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
> {
> struct address_space *mapping = vma->vm_file->f_mapping;
> pgoff_t idx = vma_hugecache_offset(h, vma, address);
> - bool present;
> -
> - rcu_read_lock();
> - present = page_cache_next_miss(mapping, idx, 1) != idx;
> - rcu_read_unlock();
> + struct folio *folio;
>
> - return present;
> + folio = filemap_get_folio(mapping, idx);
> + if (folio)
> + folio_put(folio);
> + return folio != NULL;
> }
>
> int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
> --
> 2.40.0
>