[PATCH 1/2] mm/hugetlb: Insert page cache for UFFDIO_COPY even if private

From: Peter Xu
Date: Wed Sep 28 2022 - 17:44:00 EST


UFFDIO_COPY resolves page fault in page cache layer for file backed
memories on shmem and hugetlbfs. It also means for each UFFDIO_COPY we
should inject the new page into page cache no matter whether it's private
or shared mappings.

We used to not do that probably because for private mappings we should not
allow the page cache be written for the private mapped process. However it
can be done by removing the write bit (as what this patch does) so that CoW
will trigger properly for the page cache.

Leaving the page cache empty could lead to below sequence:

(1) map hugetlb privately, register with uffd missing+wp
(2) read page, trigger MISSING event with READ
(3) UFFDIO_COPY(wp=1) resolve page fault, keep wr-protected
(4) write page, trigger MISSING event again (because page cache missing!)
with WRITE

This behavior existed since the initial commit of hugetlb MISSING mode
support, which is commit 1c9e8def43a3 ("userfaultfd: hugetlbfs: add
UFFDIO_COPY support for shared mappings", 2017-02-22). In most cases it
should be fine as long as the hugetlb page/pte will be stable (e.g., no
wr-protect, no MADV_DONTNEED, ...). However for any reason if a further
page fault is triggered, it could cause issue. Recently due to the newly
introduced uffd-wp on hugetlbfs and also a recent locking rework from Mike,
we can easily fail userfaultfd kselftest with hugetlb private mappings.

One further step is we can do early CoW if the private mapping is
writable, but let's leave that for later.

Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Cc: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
mm/hugetlb.c | 28 ++++++++--------------------
1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9679fe519b90..a43fc6852f27 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5933,14 +5933,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
int ret = -ENOMEM;
struct page *page;
int writable;
- bool page_in_pagecache = false;

if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, idx);
if (!page)
goto out;
- page_in_pagecache = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
@@ -6014,8 +6012,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
*/
__SetPageUptodate(page);

- /* Add shared, newly allocated pages to the page cache. */
- if (vm_shared && !is_continue) {
+ /* Add newly allocated pages to the page cache for UFFDIO_COPY. */
+ if (!is_continue) {
size = i_size_read(mapping->host) >> huge_page_shift(h);
ret = -EFAULT;
if (idx >= size)
@@ -6030,7 +6028,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ret = hugetlb_add_to_page_cache(page, mapping, idx);
if (ret)
goto out_release_nounlock;
- page_in_pagecache = true;
}

ptl = huge_pte_lock(h, dst_mm, dst_pte);
@@ -6044,18 +6041,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
if (!huge_pte_none_mostly(huge_ptep_get(dst_pte)))
goto out_release_unlock;

- if (page_in_pagecache) {
- page_dup_file_rmap(page, true);
- } else {
- ClearHPageRestoreReserve(page);
- hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
- }
+ page_dup_file_rmap(page, true);

/*
- * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
- * with wp flag set, don't set pte write bit.
+ * For either: (1) a non-shared VMA, or (2) UFFDIO_COPY with wp
+ * flag set, don't set pte write bit.
*/
- if (wp_copy || (is_continue && !vm_shared))
+ if (wp_copy || !vm_shared)
writable = 0;
else
writable = dst_vma->vm_flags & VM_WRITE;
@@ -6083,18 +6075,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
spin_unlock(ptl);
if (!is_continue)
SetHPageMigratable(page);
- if (vm_shared || is_continue)
- unlock_page(page);
+ unlock_page(page);
ret = 0;
out:
return ret;
out_release_unlock:
spin_unlock(ptl);
- if (vm_shared || is_continue)
- unlock_page(page);
+ unlock_page(page);
out_release_nounlock:
- if (!page_in_pagecache)
- restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
}
--
2.32.0


--6BJDNHv8LiVvXy5W
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment;
filename="0002-selftests-vm-Use-memfd-for-hugetlb-tests.patch"