Re: [PATCH] mm/mmap/vma_merge: always check invariants

From: Mike Rapoport
Date: Thu May 11 2023 - 14:09:41 EST


(adding Peter)

On Wed, May 10, 2023 at 09:26:10AM -0700, Lorenzo Stoakes wrote:
> On Wed, May 10, 2023 at 05:17:49PM +0100, Mark Rutland wrote:
> > On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote:
> > > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote:
> > > > Hi,
> > > >
> > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote:
> > > > > We may still have inconsistent input parameters even if we choose not to
> > > > > merge and the vma_merge() invariant checks are useful for checking this
> > > > > with no production runtime cost (these are only relevant when
> > > > > CONFIG_DEBUG_VM is specified).
> > > > >
> > > > > Therefore, perform these checks regardless of whether we merge.
> > > > >
> > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy:
> > > > > Correctly update prev when policy is equal on mbind") in the mbind logic
> > > > > was only picked up in the 6.2.y stable branch where these assertions are
> > > > > performed prior to determining mergeability.
> > > > >
> > > > > Had this remained the same in mainline this issue may have been picked up
> > > > > faster, so moving forward let's always check them.
> > > > >
> > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx>
> > > > > ---
> > > > > mm/mmap.c | 10 +++++-----
> > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > index 5522130ae606..13678edaa22c 100644
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > > > > merge_next = true;
> > > > > }
> > > > >
> > > > > + /* Verify some invariant that must be enforced by the caller. */
> > > > > + VM_WARN_ON(prev && addr <= prev->vm_start);
> > > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> > > > > + VM_WARN_ON(addr >= end);
> > > > > +
> > > >
> > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller.
> > > >
> > >
> > > Thanks, from the line I suspect addr != curr->vm_start, but need to look
> > > into the repro, at lsf/mm so a bit time lagged :)
> >
> > No problem; FWIW I can confirm your theory, the reproducer is causing:
> >
> > addr > curr->vm_start
> >
> > ... confirmed the the following hack, log below.
>
> Awesome thanks for that! Just been firing up qemu to do this.
>
> Cases 5-8 should really have addr == curr->vm_start, I wonder if it's
> another case but curr is being set incorrectly, it should in theory not be
> the case.

AFAIU, it's a case of "adjust vma, but don't merge, because prev is not
compatible". Looks like uffd first attempts to merge compatible the newly
registered range with adjacent vmas relying on that there won't be no merge
when addr != curr->vm_start and only after the merge attempt it splits the
edges.

I think that moving the split in fs/userfaultfd.c:1495 (as of v6.4-rc1)
before vma_merge() will be the right fix.

> (See [1] for a visualisation of merge cases as a handy reference)
>
> Of course userfaultfd might be the offender here and might be relying on no
> merge case arising but passing dodgy parameters.
>
> [1]:https://ljs.io/merge_cases.png

You really should put it into Documentation/mm ;-)

> >
> > | diff --git a/mm/mmap.c b/mm/mmap.c
> > | index 13678edaa22c..2cdebba15719 100644
> > | --- a/mm/mmap.c
> > | +++ b/mm/mmap.c
> > | @@ -961,9 +961,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > | }
> > |
> > | /* Verify some invariant that must be enforced by the caller. */
> > | - VM_WARN_ON(prev && addr <= prev->vm_start);
> > | - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> > | - VM_WARN_ON(addr >= end);
> > | + VM_WARN(prev && addr <= prev->vm_start,
> > | + "addr = 0x%016lx, prev->vm_start = 0x%016lx\n",
> > | + addr, prev->vm_start);
> > | +
> > | + VM_WARN(curr && addr != curr->vm_start,
> > | + "addr = 0x%016lx, curr->vm_start = 0x%016lx\n",
> > | + addr, curr->vm_start);
> > | +
> > | + VM_WARN(curr && addr > curr->vm_end,
> > | + "addr = 0x%016lx, curr->vm_end = 0x%016lx\n",
> > | + addr, curr->vm_end);
> > | +
> > | + VM_WARN(addr >= end,
> > | + "addr = 0x%016lx, end = 0x%016lx\n",
> > | + addr, end);
> > |
> > | if (!merge_prev && !merge_next)
> > | return NULL; /* Not mergeable. */
> >
> > ... with that applied, running the reproducer results in:
> >
> > | addr = 0x0000ffff99dc2000, curr->vm_start = 0x0000ffff99db2000
> > | WARNING: CPU: 0 PID: 163 at mm/mmap.c:968 vma_merge+0x3d4/0x1260
> >
> > ... i.e. addr > curr->vm_start
> >
> > Thanks,
> > Mark.
>