RE: [PATCH] mm/mmap: move vma operations to mm_struct out of the critical section of file mapping lock

From: Ma, Yu
Date: Wed Jun 21 2023 - 10:21:07 EST


+ Christoph Hellwig in To list

Hi Chris,
Sorry that it is pretty long ago, do we have any idea about why put vma operations to mm struct in the critical section of file mapping lock (i.e vma_link() as below, the operations to mm struct like __vma_link_list and __vma_link_rb is protected by mapping lock). We'd like to study more here if possible, sincerely appreciate for any hint or feedback.
https://github.com/mpe/linux-fullhistory/commit/bbbce8f41d3da0ac968bab7a967e12e2be1a7eb0


Regards
Yu

> > * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [230606 15:20]:
> > > * Yu Ma <yu.ma@xxxxxxxxx> [230606 08:23]:
> > > > UnixBench/Execl represents a class of workload where bash scripts
> > > > are spawned frequently to do some short jobs. When running
> > > > multiple parallel tasks, hot osq_lock is observed from do_mmap and
> exit_mmap.
> > > > Both of them come from load_elf_binary through the call chain
> > > > "execl->do_execveat_common->bprm_execve->load_elf_binary". In
> > > > do_mmap,it will call mmap_region to create vma node, initialize it
> > > > and insert it to vma maintain structure in mm_struct and i_mmap
> > > > tree of the mapping file, then increase map_count to record the
> > > > number of vma nodes used. The hot osq_lock is to protect
> > > > operations on file’s i_mmap tree. For the mm_struct member change
> > > > like vma insertion and map_count update, they do not affect i_mmap
> > > > tree. Move those operations out of the lock's critical section, to
> > > > reduce hold time on the
> > lock.
> > > >
> > > > With this change, on Intel Sapphire Rapids 112C/224T platform,
> > > > based on v6.0-rc6, the 160 parallel score improves by 12%. The
> > > > patch has no obvious performance gain on v6.4-rc4 due to
> > > > regression of this benchmark from this commit
> > f1a7941243c102a44e8847e3b94ff4ff3ec56f25
> > > > (mm: convert mm's rss stats into percpu_counter).
> > >
> > > I didn't think it was safe to insert a VMA into the VMA tree without
> > > holding this write lock? We now have a window of time where a file
> > > mapping doesn't exist for a vma that's in the tree? Is this always
> > > safe? Does the locking order in mm/rmap.c need to change?
> >
> > So I'm pretty sure it's not safe because we've been ensuring that this
> > lock was taken during vma tree inserts since 2002 [1]. Take a look at
> > vma_link() in that commit. I still don't have an answer as to why
> > it's not safe though.
> >
> > [1] https://github.com/mpe/linux-
> > fullhistory/commit/bbbce8f41d3da0ac968bab7a967e12e2be1a7eb0
> >
>
> Thanks Liam for your quick review and digging in related code. I just checked
> vma_link() in 2002, the file lock is there to protect __vma_link(), and in
> __vma_link(), there are 3 functions, the first 2 are operations to insert vma to
> mm_struct, and the last func __vma_link_file() is to insert vma to the file
> mapping tree. So this file lock around __vma_link() makes sense since it has
> operations of file mapping tree inside. It still cannot explain why the
> operations to mm_struct cannot be moved out.
>
> > >
> > > >Related discussion and conclusion
> > > > can be referred at the mail thread initiated by 0day as below:
> > > > Link:
> > > >https://lore.kernel.org/linux-mm/a4aa2e13-7187-600b-c628-
> > 7e8fb108def0
> > > >@intel.com/
> > >
> > > I don't see a conclusion on that thread talking about changing the
> > > locking order?
> I may not intro this link clear, it is about why no obvious improvement
> observed on latest kernel for the time being :)
>
> > >
> > > >
> > > > Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> > > > Signed-off-by: Yu Ma <yu.ma@xxxxxxxxx>
> > > > ---
> > > > mm/mmap.c | 4 +---
> > > > 1 file changed, 1 insertion(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c index
> > > > 13678edaa22c..0e694a0433bc 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -2711,12 +2711,10 @@ unsigned long mmap_region(struct file
> > > > *file,
> > unsigned long addr,
> > > > if (vma_iter_prealloc(&vmi))
> > > > goto close_and_free_vma;
> > > >
> > > > - if (vma->vm_file)
> > > > - i_mmap_lock_write(vma->vm_file->f_mapping);
> > > > -
> > > > vma_iter_store(&vmi, vma);
> > > > mm->map_count++;
> > > > if (vma->vm_file) {
> > > > + i_mmap_lock_write(vma->vm_file->f_mapping);
> > > > if (vma->vm_flags & VM_SHARED)
> > > > mapping_allow_writable(vma->vm_file->f_mapping);
> > > >
> > > > --
> > > > 2.39.3
> > > >
> > > >