Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

From: Peter Xu
Date: Mon Dec 21 2020 - 17:32:29 EST


On Mon, Dec 21, 2020 at 01:49:55PM -0800, Nadav Amit wrote:
> BTW: In general, I think that you are right, and that changing of PTEs
> should not require taking mmap_lock for write. However, I am not sure
> cow_user_page() is not the only one that poses a problem and whether a more
> systematic solution is needed. If cow_user_pages() is the only problem, do
> you think it is possible to do the copying while holding the PTL? It works
> for normal-pages, but I am not sure whether special-pages pose special
> problems.
>
> Anyhow, this is an enhancement that we can try later.

AFAIU mprotect() is the only one who modifies the pte using the mmap write
lock. NUMA balancing is also using read mmap lock when changing pte
protections, while my understanding is mprotect() used write lock only because
it manipulates the address space itself (aka. vma layout) rather than modifying
the ptes, so it needs to.

At the pte level, it seems always to be the pgtable lock that serializes things.

So it's perfectly legal to me for e.g. a driver to modify ptes with the read
lock of mmap_sem, unless I'm severely mistaken.. as long as the pgtable lock is
taken when doing so.

If there's a driver that manipulated the ptes, changed the content of the page,
recover the ptes to origin, and all these happen right after wp_page_copy()
unlocked the pgtable lock but before wp_page_copy() retakes the same lock
again, we may face the same issue finding that the page got copied contains
corrupted data at last. While I don't know what to blame on the driver either
because it seems to be exactly following the rules.

I believe changing into write lock would solve the race here because tlb
flushing would be guaranteed along the way, but I'm just a bit worried it's not
the best way to go..

Thanks,

--
Peter Xu