Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache

From: Matthew Wilcox
Date: Mon Oct 04 2021 - 15:35:34 EST


On Mon, Oct 04, 2021 at 11:28:45AM -0700, Yang Shi wrote:
> On Sat, Oct 2, 2021 at 10:09 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Sep 30, 2021 at 10:39:14AM -0700, Yang Shi wrote:
> > > On Thu, Sep 30, 2021 at 9:49 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > > > I assume you're thinking of one of the fuzzer blkdev ones:
> > > > https://lore.kernel.org/linux-mm/CACkBjsbtF_peC7N_4mRfHML_BeiPe+O9DahTfr84puSG_J9rcQ@xxxxxxxxxxxxxx/
> > > > or
> > > > https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@xxxxxxxxxxxxxx/
> > > >
> > > > I haven't started on those ones yet: yes, I imagine one or both of those
> > > > will need a further fix (S_ISREG() check somewhere if we're lucky; but
> > > > could well be nastier); but for the bug in this thread, I expect
> > >
> > > Makes sense to me. We should be able to check S_ISREG() in khugepaged,
> > > if it is not a regular file, just bail out. Sounds not that nasty to
> > > me AFAIU.
> >
> > I don't see why we should have an S_ISREG() check. I agree it's not the
> > intended usecase, but it ought to work fine. Unless there's something
> > I'm missing?
>
> Check out this bug report:
> https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@xxxxxxxxxxxxxx/
> and the patch from me:
> https://lore.kernel.org/linux-mm/20210917205731.262693-1-shy828301@xxxxxxxxx/
>
> I don't think we handle buffers correctly for file THP, right? My
> patch is ad hoc, so I thought Hugh's suggestion makes some sense to
> me. Why do we have THP collapsed for unintended usecase in the first
> place?

OK, I've done some more digging. I think what's going on with this
report is userspace opens the block device RO, causes the page cache to
be loaded with data, then khugepaged comes in and creates THPs.
What confuses me is that these THPs have private data attached to them.
I don't know how that happens. If it's block device specific, then
yes, something like your S_ISREG() idea should work fine. Otherwise,
we might need to track down another problem.

Hao Sun, can you try this patch and see what comes out of it?

diff --git a/fs/buffer.c b/fs/buffer.c
index c615387aedca..fefe05a9973d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -872,6 +872,7 @@ link_dev_buffers(struct page *page, struct buffer_head *head)
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
+ VM_BUG_ON_PAGE(PageCompound(page), page);
attach_page_private(page, head);
}

@@ -1577,6 +1578,7 @@ void create_empty_buffers(struct page *page,
bh = bh->b_this_page;
} while (bh != head);
}
+ VM_BUG_ON_PAGE(PageCompound(page), page);
attach_page_private(page, head);
spin_unlock(&page->mapping->private_lock);
}
@@ -2565,6 +2567,7 @@ static void attach_nobh_buffers(struct page *page, struct buffer_head *head)
bh->b_this_page = head;
bh = bh->b_this_page;
} while (bh != head);
+ VM_BUG_ON_PAGE(PageCompound(page), page);
attach_page_private(page, head);
spin_unlock(&page->mapping->private_lock);
}