Re: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions

From: James Houghton
Date: Thu Dec 15 2022 - 13:08:40 EST


On Thu, Dec 15, 2022 at 12:52 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 12/13/22 10:49, James Houghton wrote:
> > On Mon, Dec 12, 2022 at 7:14 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> > >
> > > On 10/21/22 16:36, James Houghton wrote:
> > > > Currently it is possible for all shared VMAs to use HGM, but it must be
> > > > enabled first. This is because with HGM, we lose PMD sharing, and page
> > > > table walks require additional synchronization (we need to take the VMA
> > > > lock).
> > >
> > > Not sure yet, but I expect Peter's series will help with locking for
> > > hugetlb specific page table walks.
> >
> > It should make things a little bit cleaner in this series; I'll rebase
> > HGM on top of those patches this week (and hopefully get a v1 out
> > soon).
> >
> > I don't think it's possible to implement MADV_COLLAPSE with RCU alone
> > (as implemented in Peter's series anyway); we still need the VMA lock.
>
> As I continue going through the series, I realize that I am not exactly
> sure what synchronization by the vma lock is required by HGM. As you are
> aware, it was originally designed to protect against someone doing a
> pmd_unshare and effectively removing part of the page table. However,
> since pmd sharing is disabled for vmas with HGM enabled (I think?), then
> it might be a good idea to explicitly say somewhere the reason for using
> the lock.

It synchronizes MADV_COLLAPSE for hugetlb (hugetlb_collapse).
MADV_COLLAPSE will take it for writing and free some page table pages,
and high-granularity walks will generally take it for reading. I'll
make this clear in a comment somewhere and in commit messages.

It might be easier if hugetlb_collapse() had the exact same
synchronization as huge_pmd_unshare, where we not only take the VMA
lock for writing, we also take the i_mmap_rw_sem for writing, so
anywhere where hugetlb_walk() is safe, high-granularity walks are also
safe. I think I should just do that for the sake of simplicity.

- James

> --
> Mike Kravetz