Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork

From: Jan Kara
Date: Fri Oct 30 2020 - 12:51:09 EST


On Fri 30-10-20 11:46:21, Jason Gunthorpe wrote:
> Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during
> fork() for ptes") pages under a FOLL_PIN will not be write protected
> during COW for fork. This means that pages returned from
> pin_user_pages(FOLL_WRITE) should not become write protected while the pin
> is active.
>
> However, there is a small race where get_user_pages_fast(FOLL_PIN) can
> establish a FOLL_PIN at the same time copy_present_page() is write
> protecting it:
>
> CPU 0 CPU 1
> get_user_pages_fast()
> internal_get_user_pages_fast()
> copy_page_range()
> pte_alloc_map_lock()
> copy_present_page()
> atomic_read(has_pinned) == 0
> page_maybe_dma_pinned() == false
> atomic_set(has_pinned, 1);
> gup_pgd_range()
> gup_pte_range()
> pte_t pte = gup_get_pte(ptep)
> pte_access_permitted(pte)
> try_grab_compound_head()
> pte = pte_wrprotect(pte)
> set_pte_at();
> pte_unmap_unlock()
> // GUP now returns with a write protected page
>
> The first attempt to resolve this by using the write protect caused
> problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid
> early COW write protect games during fork()")
>
> Instead wrap copy_p4d_range() with the write side of a seqcount and check
> the read side around gup_pgd_range(). If there is a collision then
> get_user_pages_fast() fails and falls back to slow GUP.
>
> Slow GUP is safe against this race because copy_page_range() is only
> called while holding the exclusive side of the mmap_lock on the src
> mm_struct.
>
> Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
> Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@xxxxxxxxxxxxxx
> Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx>
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>

Looks good to me. Just one nit below. With that fixed feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

> @@ -446,6 +447,12 @@ struct mm_struct {
> */
> atomic_t has_pinned;
>
> + /**
> + * @write_protect_seq: Odd when any thread is write protecting
> + * pages in this mm, for instance during fork().
> + */
> + seqcount_t write_protect_seq;
> +

So this comment isn't quite true. We can be writeprotecting pages due to
many other reasons and not touch write_protect_seq. E.g. for shared
mappings or due to explicit mprotect() calls. So the write_protect_seq
protection has to be about something more than pure write protection. One
requirement certainly is that the VMA has to be is_cow_mapping(). What
about mprotect(2) calls? I guess the application would have only itself to
blame so we don't care?

Anyway my point is just that the comment should tell more what this is
about. I'd even go as far as making it "page table copying during fork in
progress".

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR