Re: [syzbot] [mm?] WARNING: suspicious RCU usage in mas_walk (2)

From: Jann Horn
Date: Thu Jul 27 2023 - 15:21:16 EST


On Thu, Jul 27, 2023 at 8:17 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jul 27, 2023 at 07:59:33PM +0200, Jann Horn wrote:
> > On Thu, Jul 27, 2023 at 7:22 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > > Hmm. lock_vma_under_rcu() specifically checks for vma->anon_vma==NULL
> > > condition (see [1]) to avoid going into find_mergeable_anon_vma() (a
> > > check inside anon_vma_prepare() should prevent that). So, it should
> > > fall back to mmap_lock'ing.
> >
> > This syzkaller report applies to a tree with Willy's in-progress patch
> > series, where lock_vma_under_rcu() only checks for vma->anon_vma if
> > vma_is_anonymous() is true - it permits private non-anonymous VMAs
> > (which require an anon_vma for handling write faults) even if they
> > don't have an anon_vma.
> >
> > The commit bisected by syzkaller
> > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a52f58b34afe095ebc5823684eb264404dad6f7b)
> > removes the vma_is_anonymous() check in handle_pte_fault(), so it lets
> > us reach do_wp_page() with a non-anonymous private VMA without
> > anon_vma, even though that requires allocation of an anon_vma.
> >
> > So I think this is pretty clearly an issue with Willy's in-progress
> > patch series that syzkaller blamed correctly.
>
> Agreed. What do we think the right solution is?
>
> Option 1:
>
> +++ b/mm/memory.c
> @@ -3197,6 +3197,12 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> struct mmu_notifier_range range;
> int ret;
>
> + if (!vma->anon_vma) {
> + // check if there are other things to undo here
> + vma_end_read(vmf->vma);
> + return VM_FAULT_RETRY;
> + }
> +
> delayacct_wpcopy_start();
>
> Option 2:
>
> @@ -5581,7 +5587,8 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> goto inval;
>
> /* find_mergeable_anon_vma uses adjacent vmas which are not locked */
> - if (vma_is_anonymous(vma) && !vma->anon_vma)
> + if ((vma_is_anonymous(vma) ||
> + vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) && !vma->anon_vma)
> goto inval;
>
> The problem with option 2 is that we don't know whether this is a write
> fault or not, so we'll handle read faults on private file
> mappings under the mmap_lock UNTIL somebody writes to the mapping, which
> might be never. That seems like a bad idea.
>
> We could pass FAULT_FLAG_WRITE into lock_vma_under_rcu(), but that also
> seems like a bad idea. I dunno. Three bad ideas. Anyone think of a
> good one?

One kinda straightforward option would be to pass the vmf (or NULL if
it's not in fault context) to anon_vma_prepare(), teach it to bail if
it runs under the mm lock, and propagate a VM_FAULT_RETRY all the way
up? It can already fail due to OOM, so the bailout paths exist, though
you'd have to work a bit to plumb the right error code up.

And if you're feeling adventurous, you could try to build a way to
opportunistically upgrade from vma lock to mmap lock, to avoid having
to bail out all the way back up and then dive back in when that
happens. Something that does mmap_read_trylock(); on failure, bail out
with VM_FAULT_RETRY; on success, drop the VMA lock and change
vmf->flags to note the changed locking context.