Re: [PATCH v2] mm/khugepaged: skip shmem with userfaultfd

From: Matthew Wilcox
Date: Mon Feb 06 2023 - 16:51:32 EST


On Mon, Feb 06, 2023 at 03:52:19PM -0500, Peter Xu wrote:
> On Mon, Feb 06, 2023 at 07:01:39PM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 06, 2023 at 08:28:56PM +0900, David Stevens wrote:
> > > This change first makes sure that the intermediate page cache state
> > > during collapse is not visible by moving when gaps are filled to after
> > > the page cache lock is acquired for the final time. This is necessary
> > > because the synchronization provided by locking hpage is insufficient
> > > for functions which operate on the page cache without actually locking
> > > individual pages to examine their content (e.g. shmem_mfill_atomic_pte).
> >
> > I've been a little scared of touching khugepaged because, well, look at
> > that function. But if we are going to touch it, how about this patch
> > first? It does _part_ of what you need by not filling in the holes,
> > but obviously not the part that looks at uffd.
> >
> > It leaves the old pages in-place and frozen. I think this should be
> > safe, but I haven't booted it (not entirely sure what test I'd run
> > to prove that it's not broken)
>
> That logic existed since Kirill's original commit to add shmem thp support
> on khugepaged, so Kirill should be the best to tell.. but so far it seems
> reasonalbe to me to have that extra operation.
>
> The problem is khugepaged will release pgtable lock during collapsing, so
> AFAICT there can be a race where some other thread tries to insert pages
> into page cache in parallel with khugepaged right after khugepaged released
> the page cache lock.
>
> For example, it seems to me new page cache can be inserted when khugepaged
> is copying small page content to the new hpage.

Mmm, yes, we need to have _something_ in the page cache to block new
pages from being added. It can be either the new or the old pages,
but it can't be NULL. It could even be a RETRY entry, since that'll
have the same effect as a frozen page.

But both David's patch and mine are wrong. Not sure what to do for
David's problem -- maybe it's OK to have the holes temporarily filled
with frozen / RETRY entries until we get to the point where we check
for an uffd marker?