Re: [PATCH RFC 04/12] mm: Introduce vma_pgtable_walk_{begin|end}()

From: Peter Xu
Date: Fri Nov 24 2023 - 10:34:17 EST


On Fri, Nov 24, 2023 at 09:32:13AM +0530, Aneesh Kumar K.V wrote:
> Peter Xu <peterx@xxxxxxxxxx> writes:
>
> > Introduce per-vma begin()/end() helpers for pgtable walks. This is a
> > preparation work to merge hugetlb pgtable walkers with generic mm.
> >
> > The helpers need to be called before and after a pgtable walk, will start
> > to be needed if the pgtable walker code supports hugetlb pages. It's a
> > hook point for any type of VMA, but for now only hugetlb uses it to
> > stablize the pgtable pages from getting away (due to possible pmd
> > unsharing).
> >
> > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> > ---
> > include/linux/mm.h | 3 +++
> > mm/memory.c | 12 ++++++++++++
> > 2 files changed, 15 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 64cd1ee4aacc..349232dd20fb 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4154,4 +4154,7 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
> > return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE);
> > }
> >
> > +void vma_pgtable_walk_begin(struct vm_area_struct *vma);
> > +void vma_pgtable_walk_end(struct vm_area_struct *vma);
> > +
> > #endif /* _LINUX_MM_H */
> > diff --git a/mm/memory.c b/mm/memory.c
> > index e27e2e5beb3f..3a6434b40d87 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -6180,3 +6180,15 @@ void ptlock_free(struct ptdesc *ptdesc)
> > kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
> > }
> > #endif
> > +
> > +void vma_pgtable_walk_begin(struct vm_area_struct *vma)
> > +{
> > + if (is_vm_hugetlb_page(vma))
> > + hugetlb_vma_lock_read(vma);
> > +}
> >
>
> That is required only if we support pmd sharing?

Correct.

Note that for this specific gup code path, we're not changing the lock
behavior because we used to call hugetlb_vma_lock_read() the same in
hugetlb_follow_page_mask(), that's also unconditionally.

It make things even more complicated if we see the recent private mapping
change that Rik introduced in bf4916922c. I think it means we'll also take
that lock if private lock is allocated, but I'm not really sure whether
that's necessary for all pgtable walks, as the hugetlb vma lock is taken
mostly in all walk paths currently, only some special paths take i_mmap
rwsem instead of the vma lock.

Per my current understanding, the private lock was only for avoiding a race
between truncate & zapping. I had a feeling that maybe there's better way
to do this rather than sticking different functions with the same lock (or,
lock api).

In summary, the hugetlb vma lock is still complicated and may prone to
further refactoring. But all those needs further investigations. This
series can be hopefully seen as completely separated from that so far.

Thanks,

--
Peter Xu