Re: [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap()

From: Andrew Morton
Date: Mon Aug 14 2023 - 15:11:54 EST


On Mon, 14 Aug 2023 17:43:39 +0200 Jann Horn <jannh@xxxxxxxxxx> wrote:

> @akpm
>
> On Mon, Jul 24, 2023 at 8:31 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> > Since prev will be set later in the function, it is better to reverse
> > the splitting direction of the start VMA (modify the new_below argument
> > to __split_vma).
>
> It might be a good idea to reorder "mm: always lock new vma before
> inserting into vma tree" before this patch.
>
> If you apply this patch without "mm: always lock new vma before
> inserting into vma tree", I think move_vma(), when called with a start
> address in the middle of a VMA, will behave like this:
>
> - vma_start_write() [lock the VMA to be moved]
> - move_page_tables() [moves page table entries]
> - do_vmi_munmap()
> - do_vmi_align_munmap()
> - __split_vma()
> - creates a new VMA **covering the moved range** that is **not locked**
> - stores the new VMA in the VMA tree **without locking it** [1]
> - new VMA is locked and removed again [2]
> [...]
>
> So after the page tables in the region have already been moved, I
> believe there will be a brief window (between [1] and [2]) where page
> faults in the region can happen again, which could probably cause new
> page tables and PTEs to be created in the region again in that window.
> (This can't happen in Linus' current tree because the new VMA created
> by __split_vma() only covers the range that is not being moved.)
>
> Though I guess that's not going to lead to anything bad, since
> do_vmi_munmap() anyway cleans up PTEs and page tables in the region?
> So maybe it's not that important.

Thanks. I'd of course prefer not to rebuild mm-stable. If this ends
up being a hard-to-hit issue during git-bisect searches, I think we can
live with that.