Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

From: Nadav Amit
Date: Mon Dec 21 2020 - 13:33:31 EST


> On Dec 21, 2020, at 9:27 AM, Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> Hi, Nadav,
>
> On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
>
> [...]
>
>> So to correct myself, I think that what I really encountered was actually
>> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
>> problem was that in this case the “write”-bit was removed during unprotect.
>> Sorry for the strange formatting to fit within 80 columns:
>
> I assume I can ignore the race mentioned in the commit message but only refer
> to this one below. However I'm still confused. Please see below.
>
>> [ Start: PTE is writable ]
>>
>> cpu0 cpu1 cpu2
>> ---- ---- ----
>> [ Writable PTE
>> cached in TLB ]
>
> Here cpu2 got writable pte in tlb. But why?
>
> If below is an unprotect, it means it must have been protected once by
> userfaultfd, right? If so, the previous change_protection_range() which did
> the wr-protect should have done a tlb flush already before it returns (since
> pages>0 - we protected one pte at least). Then I can't see why cpu2 tlb has
> stall data.

Thanks, Peter. Just as you can munprotect() a region which was not protected
before, you can ufff-unprotect a region that was not protected before. It
might be that the user tried to unprotect a large region, which was
partially protected and partially unprotected.

The selftest obviously blindly unprotect some regions to check for bugs.

So to your question - it was not write-protected (think about initial copy
without write-protecting).

> If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't
> happen either, because cpu1/cpu2 will all go through the cow path and pgtable
> lock should serialize them.
>
>> userfaultfd_writeprotect()
>> [ write-*unprotect* ]
>> mwriteprotect_range()
>> mmap_read_lock()
>> change_protection()
>>
>> change_protection_range()
>> ...
>> change_pte_range()
>> [ *clear* “write”-bit ]
>> [ defer TLB flushes]
>> [ page-fault ]
>> …
>> wp_page_copy()
>> cow_user_page()
>> [ copy page ]
>> [ write to old
>> page ]
>> …
>> set_pte_at_notify()
>>
>> [ End: cpu2 write not copied form old to new page. ]
>
> Could you share how to reproduce the problem? I would be glad to give it a
> shot as well.

You can run the selftests/userfaultfd with my small patch [1]. I ran it with
the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is
more easily reproducible with “mitigations=off idle=poll” as kernel
parameters.

[1] https://lore.kernel.org/patchwork/patch/1346386/

>
>> [1] https://lore.kernel.org/patchwork/patch/1346386
>
> PS: Sorry to not have read the other series of yours. It seems to need some
> chunk of time so I postponed it a bit due to other things; but I'll read at
> least the fixes very soon.

Thanks again, I will post RFCv2 with some numbers soon.