Re: [PATCH v2] mm/khugepaged: skip shmem with userfaultfd

From: David Stevens
Date: Mon Feb 06 2023 - 20:37:23 EST


On Tue, Feb 7, 2023 at 6:50 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Mon, Feb 06, 2023 at 03:52:19PM -0500, Peter Xu wrote:
> > On Mon, Feb 06, 2023 at 07:01:39PM +0000, Matthew Wilcox wrote:
> > > On Mon, Feb 06, 2023 at 08:28:56PM +0900, David Stevens wrote:
> > > > This change first makes sure that the intermediate page cache state
> > > > during collapse is not visible by moving when gaps are filled to after
> > > > the page cache lock is acquired for the final time. This is necessary
> > > > because the synchronization provided by locking hpage is insufficient
> > > > for functions which operate on the page cache without actually locking
> > > > individual pages to examine their content (e.g. shmem_mfill_atomic_pte).
> > >
> > > I've been a little scared of touching khugepaged because, well, look at
> > > that function. But if we are going to touch it, how about this patch
> > > first? It does _part_ of what you need by not filling in the holes,
> > > but obviously not the part that looks at uffd.
> > >
> > > It leaves the old pages in-place and frozen. I think this should be
> > > safe, but I haven't booted it (not entirely sure what test I'd run
> > > to prove that it's not broken)
> >
> > That logic existed since Kirill's original commit to add shmem thp support
> > on khugepaged, so Kirill should be the best to tell.. but so far it seems
> > reasonalbe to me to have that extra operation.
> >
> > The problem is khugepaged will release pgtable lock during collapsing, so
> > AFAICT there can be a race where some other thread tries to insert pages
> > into page cache in parallel with khugepaged right after khugepaged released
> > the page cache lock.
> >
> > For example, it seems to me new page cache can be inserted when khugepaged
> > is copying small page content to the new hpage.

This particular race can't happen with either patch, since the missing
page cache entries are filled when we create the multi-index entry for
hpage.

> Mmm, yes, we need to have _something_ in the page cache to block new
> pages from being added. It can be either the new or the old pages,
> but it can't be NULL. It could even be a RETRY entry, since that'll
> have the same effect as a frozen page.
>
> But both David's patch and mine are wrong. Not sure what to do for
> David's problem -- maybe it's OK to have the holes temporarily filled
> with frozen / RETRY entries until we get to the point where we check
> for an uffd marker?

My patch re-counts the holes after acquiring the page cache lock for
the final time, right before creating the final hpage multi-index
entry. Since we lock present pages while iterating over the target
range, they can't have been truncated before our re-validation of
nr_none. So if the number of missing pages is still equal to nr_none,
then we know that nothing has come along and filled in a missing page.
Compared to adding some sort of marker for missing pages, this does
add another failure path for collapse, but I don't think there is any
race.

-David