Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page

From: Yang Shi
Date: Mon Aug 23 2021 - 13:47:22 EST


On Sun, Aug 22, 2021 at 10:05 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@xxxxxxx> wrote:
>
> On Fri, Aug 20, 2021 at 11:40:24AM -0700, Yang Shi wrote:
> > On Thu, Aug 19, 2021 at 11:48 PM HORIGUCHI NAOYA(堀口 直也)
> > <naoya.horiguchi@xxxxxxx> wrote:
> > >
> > > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > > > Currently just very simple message is shown for unhandlable page, e.g.
> > > > non-LRU page, like:
> > > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > > >
> > > > It is not very helpful for further debug, calling dump_page() could show
> > > > more useful information.
> > > >
> > > > Calling dump_page() in get_any_page() in order to not duplicate the call
> > > > in a couple of different places. It may be called with pcp disabled and
> > > > holding memory hotplug lock, it should be not a big deal since hwpoison
> > > > handler is not called very often.
> > > >
> > > > Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> > > > Cc: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> > > > Cc: Oscar Salvador <osalvador@xxxxxxx>
> > > > Signed-off-by: Yang Shi <shy828301@xxxxxxxxx>
> > > > ---
> > > > mm/memory-failure.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > index 7cfa134b1370..60df8fcd0444 100644
> > > > --- a/mm/memory-failure.c
> > > > +++ b/mm/memory-failure.c
> > > > @@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
> > > > ret = -EIO;
> > > > }
> > > > out:
> > > > + if (ret == -EIO)
> > > > + dump_page(p, "hwpoison: unhandlable page");
> > > > +
> > >
> > > I feel that 4 callers of get_hwpoison_page() are in the different context,
> > > so it might be better to consider them separately to add dump_page() or not.
> > > soft_offline_page() still prints out "%s: %#lx: unknown page type: %lx (%pGp)"
> >
> > No strong opinion to keep or remove it.
>
> Reading the explanation below, I think that calling dump_page() in the
> original place is fine. So let's remove "else if (ret == 0)" block in
> soft_offline_page().

The "else if (ret == 0)" block is used to handle free page IIUC. I'm
supposed you mean the "else if (ret == -EIO)" block which just calls
printk.

>
> >
> > > message, which might be duplicate so this printk() may be dropped.
> > > In memory_failure_hugetlb() and memory_failure(), we can call dump_page() after
> > > action_result(). unpoison_memory() doesn't need dump_page() at all because
> > > it's related to already hwpoisoned page.
> >
> > I don't have a strong opinion either to have the dump_page() called
> > either before action or after action, it just moves around the dumped
> > page information around that printk.
> >
> > For unpoison_memory(), I think it is harmless to have dump_page()
> > called, right? If get_hwpoison_page() can't return -EIO, then the
> > dump_page() won't be called at all, if it is possible then this is
> > exactly why we call dump_page() to help debug.
> >
> > So IMHO calling dump_page() in get_any_page when -EIO is returned
> > could work for all the cases well and avoid duplicating the call.
>
> Fair enough. So could you repost 3/3 with the above change in soft_offline_page()?
>
> Thanks,
> Naoya Horiguchi