Re: [PATCH v2 1/2] mm/migrate: Fix read-only page got writable when recover pte

From: David Hildenbrand
Date: Tue Nov 15 2022 - 12:23:11 EST


I consider UFFD-wp a special case: while the default VMA protection might
state that it is writable, you actually want individual PTEs to be
write-protected and have to manually remove the protection.

softdirty tracking is another special case: however, softdirty tracking is
enabled for the whole VMA. For remove_migration_pte() that should be fine (I
guess) because writenotify is active when the VMA needs to track softdirty
bits, and consequently vma->vm_page_prot has the proper default permissions.


I wonder if the following (valid), for example is possible:


1) clear_refs() clears VM_SOFTDIRTY and pte_wrprotect() the pte.
-> writenotify is active and vma->vm_page_prot updated accordingly

VM_SOFTDIRTY is reset due to VMA merging and vma->vm_page_prot is updated
accordingly. See mmap_region() where we set VM_SOFTDIRTY.

If you now migrate the (still write-protected in the PTE) page, it was not
writable, but it can be writable on the destination.

I didn't even notice merging could work with soft-dirty enabled, that's
interesting to know.

Yes I think it's possible and I agree it's safe, as VM_SOFTDIRTY is set for
the merged vma so afaiu the write bit is safe to set. We get a bunch of
false positives but that's how soft-dirty works.

I think the whole problem is easier if we see this at a higher level.
You're discussing this from vma pov and it's fair to do so, at least I
agree with what you mentioned so far and I can't see anything outside
uffd-wp that can be affected. However, it is also true when you noticed we
already have quite a few paragraphs trying to discuss the safety for this
and that, that's the part where I think we need justification and it's not
that "natural".

For "natural", I meant fundamentally we're talking about page migrations
here. The natural way (at least to me) for page migration to happen as a
fundamental rule is that, we leverag the migration pte to make sure the pte
being stable so we can do the migration work, then we "recover" the pte to
present either by a full recovery or just (hopefully) only replace the pfn,
keeping all the rest untouched.

One thing to prove that is we have two migration entries not one (I'm
temporarily put the exclusive read one aside since that's solving different
problems): MIGRATION_READ and MIGRATION_WRITE. If we only rely on vma
flags logically we don't need MIGRATION_READ and MIGRATION_WRITE, we only
need MIGRATION generic swap pte then we recover the write bit from vma
flags and other things (like uffd-wp, currently we have the bit set in swap
pte besides the swap entry type).

So maybe one day we can use two migration types rather than three
(MIGRATION and MIGRATION_EXCLUSIVE)? I can't tell, but hopefully that
shows what I meant, that we need further justification to grant write bit
only base on vma, rather than recovering write bit based on migration entry
type.

That's precisely what I had in mind recently, and I am happy to hear that you have similar idea:

https://lkml.kernel.org/r/20221108174652.198904-6-david@xxxxxxxxxx

"
Note that we don't optimize for the actual migration case:
(1) When migration succeeds the new PTE will not be writable because the
source PTE was not writable (protnone); in the future we
might just optimize that case similarly by reusing
can_change_pte_writable()/can_change_pmd_writable() when removing
migration PTEs.
"

Currently, "readable_migration_entry" is even wrong: it might be PROT_NONE and not even readable.

--
Thanks,

David / dhildenb