Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

From: David Hildenbrand
Date: Thu Sep 28 2023 - 13:16:06 EST


On 27.09.23 20:25, Suren Baghdasaryan wrote:

I have some cleanups pending for page_move_anon_rmap(), that moves the
SetPageAnonExclusive hunk out. Here we should be using
page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my cleanups]

I'll send them out soonish.

Should I keep this as is in my next version until you post the
cleanups? I can add a TODO comment to convert it to
folio_move_anon_rmap() once it's ready.

You should just be able to use page_move_anon_rmap() and whatever gets in first cleans it up :)



+ WRITE_ONCE(src_folio->index, linear_page_index(dst_vma,
+ dst_addr)); >> +
+ orig_src_pte = ptep_clear_flush(src_vma, src_addr, src_pte);
+ orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
+ orig_dst_pte = maybe_mkwrite(pte_mkdirty(orig_dst_pte),
+ dst_vma);

I think there's still a theoretical issue here that you could fix by
checking for the AnonExclusive flag, similar to the huge page case.

Consider the following scenario:

1. process P1 does a write fault in a private anonymous VMA, creating
and mapping a new anonymous page A1
2. process P1 forks and creates two children P2 and P3. afterwards, A1
is mapped in P1, P2 and P3 as a COW page, with mapcount 3.
3. process P1 removes its mapping of A1, dropping its mapcount to 2.
4. process P2 uses vmsplice() to grab a reference to A1 with get_user_pages()
5. process P2 removes its mapping of A1, dropping its mapcount to 1.

If at this point P3 does a write fault on its mapping of A1, it will
still trigger copy-on-write thanks to the AnonExclusive mechanism; and
this is necessary to avoid P3 mapping A1 as writable and writing data
into it that will become visible to P2, if P2 and P3 are in different
security contexts.

But if P3 instead moves its mapping of A1 to another address with
remap_anon_pte() which only does a page mapcount check, the
maybe_mkwrite() will directly make the mapping writable, circumventing
the AnonExclusive mechanism.


Yes, can_change_pte_writable() contains the exact logic when we can turn
something easily writable even if it wasn't writable before. which
includes that PageAnonExclusive is set. (but with uffd-wp or softdirty
tracking, there is more to consider)

For uffd_remap can_change_pte_writable() would fail it VM_WRITE is not
set, but we want remapping to work for RO memory as well. Are you

In a VMA without VM_WRITE you certainly wouldn't want to make PTEs writable :) That's why that function just does a sanity check that it is not called in strange context. So one would only call it if VM_WRITE is set.

saying that a PageAnonExclusive() check alone would not be enough
here?

There are some interesting questions to ask here:

1) What happens if the old VMA has VM_SOFTDIRTY set but the new one not? You most probably have to mark the PTE softdirty and not make it writable.

2) VM_UFFD_WP requires similar care I assume? Peter might know.

--
Cheers,

David / dhildenb