Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()

From: Naoya Horiguchi
Date: Mon Jul 17 2023 - 20:40:14 EST


On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
> On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> > It was pointed out[1] that using folio_test_hwpoison() is wrong
> > as we need to check the indiviual page that has poison.
> > folio_test_hwpoison() only checks the head page so go back to using
> > PageHWPoison().
> >
> > Reported-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> > Cc: stable@xxxxxxxxxxxxxxx #v6.4
> > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
> >
> > [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@xxxxxxxxxxxxxxxxxxxx/
> > ---
> > mm/memory-failure.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 02b1d8f104d51..a114c8c3039cd 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
> > goto unlock_mutex;
> > }
> >
> > - if (!folio_test_hwpoison(folio)) {
> > + if (!PageHWPoison(p)) {
>
>
> I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
> set on the head page, rather than on the raw subpage. In the case of
> hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
> pages. (I believe this is not detected because no one considers the
> scenario of unpoisoning hwpoisoned thps, which is a rare case). Perhaps the
> function is_page_hwpoison() would be useful for this purpose?

Sorry, I was wrong. Checking PageHWPoison() is fine because the users of
unpoison should know where the PageHWPoison is set via /proc/kpageflags.
So this patch is OK to me after comments from other reviewers are resolved.

Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>

Thanks,
Naoya Horiguchi