Re: [PATCH RFC] mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA

From: Peter Xu
Date: Tue Dec 06 2022 - 16:19:37 EST


Hi, Hugh,

On Tue, Dec 06, 2022 at 11:09:40AM -0800, Hugh Dickins wrote:
> I have not been following the uffd-wp work, but I believe that David's
> painstaking and excellent account of vm_page_prot is correct. Peter,
> please I beg you to follow his advice and go for (1) for uffd-wp.

Thanks for jumping in, I will. Your input definitely matters.

I think the fundamental moot point was whether vm_page_prot is the "safest"
or "default" behavior of vma.

I thought it was the "default" and I don't want to have VM_UFFD_WP regions
to have default no-write attribute comparing to generic ones. I wanted it
to behave exactly like a normal shmem vma if no one explicitly does
something else.

If it's "safest" and you also agree then I think indeed VM_UFFD_WP falls
into that category; frankly I don't have a solid clue before. Maybe that
also matches better with the recent vma_wants_manual_pte_write_upgrade(),
or we start to lose write bit in do_numa_page() at least for uffd-wp
protected ranges when recovering the ptes, and it'll be ugly to have:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3a3d23b3dbe2..718d540b9eb4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2086,7 +2086,7 @@ static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma
* private mappings, that's always the case when we have write
* permissions as we properly have to handle COW.
*/
- if (vma->vm_flags & VM_SHARED)
+ if ((vma->vm_flags & VM_SHARED) && !userfaultfd_wp(vma))
return vma_wants_writenotify(vma, vma->vm_page_prot);
return !!(vma->vm_flags & VM_WRITE);

To recover the write bit.

I tried to move forward with the 47ba8bcc33b2 ("mm/migrate: fix read-only
page got writable when recover pte", 2022-11-25) in mm-unstable upstream
because that seems pretty obvious to me and that's verified in the test
beds, but I obviously failed anyway. Let's adjust the patches with the
best we can have.

> I do not share David's faith in "documented": documented or not,
> depart from safe convention and you will be adding (at least the
> opportunity for) serious bugs.

Yes I agree not having the write bit is safer. That's also why I wanted to
move the wrprotect into pte_mkuffd_wp. I assume from "resolving problem"
pov they're the same, but I'll follow your advise.

David, would you please repost a new patch for this one and copy Ives to
make sure it'll work for him in his systems?

I'd suggest to drop the mprotect example, I'll reply later on that to the
other email but I still don't know whether it's a good example for a reader
to understand the problem.

No reproducer needed for numa I think - I guess Ives's test case would be
far enough to verify it if possible. I also hope what Ives saw was the
numa balancing issue you reported, so maybe it'll resolve all problem he
has. Then with that verified and queued we can drop the mm/migrate patch.

Thanks,

--
Peter Xu