Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

From: Nadav Amit
Date: Tue Dec 22 2020 - 15:20:35 EST


> On Dec 22, 2020, at 11:44 AM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
>
> On Mon, Dec 21, 2020 at 02:55:12PM -0800, Nadav Amit wrote:
>> wouldn’t mmap_write_downgrade() be executed before mprotect_fixup() (so
>
> I assume you mean "in" mprotect_fixup, after change_protection.
>
> If you would downgrade the mmap_lock to read there, then it'd severely
> slowdown the non contention case, if there's more than vma that needs
> change_protection.
>
> You'd need to throw away the prev->vm_next info and you'd need to do a
> new find_vma after droping the mmap_lock for reading and re-taking the
> mmap_lock for writing at every iteration of the loop.
>
> To do less harm to the non-contention case you could perhaps walk
> vma->vm_next and check if it's outside the mprotect range and only
> downgrade in such case. So let's assume we intend to optimize with
> mmap_write_downgrade only the last vma.


I read in detail your response and you make great points. To be fair,
I did not think it through and just tried to make a point that not
taking mmap_lock for write is an unrelated optimization.

So you make a great point that it is actually related and can only(?)
benefit uffd and arguably soft-dirty, to which I am going to add
mmap_write_lock().

Yet, my confidence in doing the optimization that you suggested (keep using
mmap_read_lock()) as part of the bug fix is very low and just got lower
since we discussed. So I do want in the future to try to avoid the overheads
I introduce (sorry), but it requires a systematic solution and some thought.

Perhaps any change to PTE in a page-table should increase a page-table
generation that we would save in the page-table page-struct (next to the
PTL) and pte_same() would also look at the original and updated "page-table
generation” when it checks if a PTE changed. So if a PTE went through
not-writable -> writable -> not-writable cycle without a TLB flush this can
go detected. Yet, there is still a question of how to detect pending TLB
flushes in finer granularity to avoid frequent unwarranted TLB flushes while
a TLB flush is pending.

It all requires some thought, and the fact that soft-dirty appears to be
broken too indicates that bugs can get unnoticed for some time.

Sorry for being a chicken, but I prefer to be safe than sorry.