Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()

From: Naoya Horiguchi
Date: Tue Jul 18 2023 - 20:01:38 EST


On Tue, Jul 18, 2023 at 07:30:23AM -0700, Sidhartha Kumar wrote:
> On 7/17/23 5:39 PM, Naoya Horiguchi wrote:
> > On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
> > > On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> > > > It was pointed out[1] that using folio_test_hwpoison() is wrong
> > > > as we need to check the indiviual page that has poison.
> > > > folio_test_hwpoison() only checks the head page so go back to using
> > > > PageHWPoison().
> > > >
> > > > Reported-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > > > Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> > > > Cc: stable@xxxxxxxxxxxxxxx #v6.4
> > > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
> > > >
> > > > [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@xxxxxxxxxxxxxxxxxxxx/
> > > > ---
> > > > mm/memory-failure.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > index 02b1d8f104d51..a114c8c3039cd 100644
> > > > --- a/mm/memory-failure.c
> > > > +++ b/mm/memory-failure.c
> > > > @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
> > > > goto unlock_mutex;
> > > > }
> > > > - if (!folio_test_hwpoison(folio)) {
> > > > + if (!PageHWPoison(p)) {
> > >
> > >
> > > I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
> > > set on the head page, rather than on the raw subpage. In the case of
> > > hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
> > > pages. (I believe this is not detected because no one considers the
> > > scenario of unpoisoning hwpoisoned thps, which is a rare case). Perhaps the
> > > function is_page_hwpoison() would be useful for this purpose?
> >
> > Sorry, I was wrong. Checking PageHWPoison() is fine because the users of
> > unpoison should know where the PageHWPoison is set via /proc/kpageflags.
> > So this patch is OK to me after comments from other reviewers are resolved.
> >
>
> Hi Naoya,
>
> While taking a closer at the patch, later in unpoison_memory() there is
> also:
>
> - ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
> + ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY;
>
> I thought this folio conversion would be safe because page is the result of
> a compound_head() call but I'm wondering if the same issue exists here and
> we should be calling TestClearPageHWPoison() on the specific subpage by
> doing TestClearPageHWPoison(p).

In this case (get_hwpoison_page returns 0), the target of unpoison_memory was
buddy page or free huge page, so there seems not any realistic problem.
But putting back to TestClearPageHWPoison() looks consistent, so I'm fine with it.

Thanks,
Naoya Horiguchi