Re: [PATCH] mm/khugepaged: fix iteration in collapse_file

From: David Stevens
Date: Fri Jun 09 2023 - 21:50:52 EST


On Sat, Jun 10, 2023 at 5:03 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> On Wed, 7 Jun 2023, David Stevens wrote:
> >
> > Remove an unnecessary call to xas_set(index) when iterating over the
> > target range in collapse_file. The extra call to xas_set reset the xas
> > cursor to the top of the tree, causing the xas_next call on the next
> > iteration to walk the tree to index instead of advancing to index+1.
> > This returned the same page again, which would cause collapse_file to
> > fail because the page is already locked.
> >
> > This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
> > used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
> > the top of the tree to index, causing the xas_next call on the next loop
> > iteration to advance the cursor as expected.
> >
> > Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag")
> > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx>
>
> This patch seems to be wrong, but I have not investigated why.
>
> It's certainly an interesting and worrying observation,
> if a CONFIG_DEBUG_VM=y kernel goes a significantly different way.
>
> I almost always do have CONFIG_DEBUG_VM=y, so you won't be surprised that
> I never saw the issue. But once I ran an mm-everything with this patch in,
> I hit that VM_BUG_ON_PAGE(page != xas_load(&xas), page) for the first time
> (after about 2 hours of huge tmpfs swapping load).

Is the particular workload one you can share? I haven't hit that
assert so far with my tests.

Also, I'm a little surprised that this is the assert which is being
hit. My patch series didn't make that many changes to the first loop
of the function, and the changes it did make were mostly about missing
pages, not present pages.

> As if you have just transferred the problem from DEBUG_VM=n to DEBUG_VM=y.
> But I then tried a CONFIG_DEBUG_VM off 6.4-rc1 kernel (including the fixee
> but not this fixer) under similar load, and saw no problem in 14 hours.
> So I can't even reproduce the bug that is being fixed here: only hit a
> bug that it introduces.

The bug this fixes isn't a crash - it's the fact that khugepage
becomes nearly unable to collapse pages. Specifically, it can only
collapse if there is exactly one present page, and that page is at
index 511. Since MADV_COLLAPSE uses the same code path, it's easy to
reproduce the bug I'm trying to fix with this program:

#define _GNU_SOURCE
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

#define THP_SIZE (2 * 1024 * 1024)

#ifndef MADV_COLLAPSE
#define MADV_COLLAPSE 25
#endif

int main() {
int memfd = memfd_create("memfd", MFD_CLOEXEC);
assert(memfd >= 0);

int ret = ftruncate(memfd, THP_SIZE);
assert(ret >= 0);

char *addr = mmap(NULL, THP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, memfd, 0);
assert(addr != MAP_FAILED);

addr[0] = 0xff;

ret = madvise(addr, THP_SIZE, MADV_COLLAPSE);
assert(ret == 0);
}

If DEBUG_VM isn't set, then the test will trigger the assert. If
DEBUG_VM is set or if this fix is included, then the test will pass.

-David

> Hugh
>
> > ---
> > mm/khugepaged.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 6b9d39d65b73..2d0d58fb4e7f 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> > TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
> >
> > xas_lock_irq(&xas);
> > - xas_set(&xas, index);
> >
> > VM_BUG_ON_PAGE(page != xas_load(&xas), page);
> >
> > --
> > 2.41.0.rc2.161.g9c6817b8e7-goog