Re: [PATCH 1/2] mm: hwpoison: don't drop slab caches for offlining non-LRU page

From: Yang Shi
Date: Wed Aug 18 2021 - 13:45:25 EST


On Tue, Aug 17, 2021 at 10:02 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@xxxxxxx> wrote:
>
> On Mon, Aug 16, 2021 at 01:24:25PM -0700, Yang Shi wrote:
> > On Mon, Aug 16, 2021 at 12:38 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Aug 16, 2021 at 11:09:08AM -0700, Yang Shi wrote:
> > > > But the most disappointing thing is all the effort doesn't make the page
> > > > offline, it just returns:
> > > >
> > > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > >
> > > It's a shame it doesn't call dump_page(). There might be more
> > > interesting information somewhere in struct page that would help us
> > > figure out what kind of page it was in your environment. For example,
> > > it might be a page table page or a page allocated for vmalloc(), and
> > > in both those cases, there are things we might be able to do (we'd
> > > certainly be able to figure out that it isn't worth shrinking slab!)
> >
> > Yes, dump_page() could provide more information to us. I could add a
> > new patch or just update this patch to call dump_page() if offline is
> > failed if the hwpoison maintainer agrees to this as well.
>
> I agree with showing more information in failure case. Thanks for the input.

By reading the code, it seems get_any_page() is called to shake the
page for both soft offline and memory_failure(), so it seems like a
good place to call dump_page() if -EIO is going to be returned, which
hwpoison can't handle the page, otherwise we may need call dump_page()
in a couple of different places.

Although dump_page() will be called with pcp disabled and holding
memory hotplug lock if it is called by get_any_page(), but I'm
supposed it should be not a big deal.

>
> - Naoya Horiguchi