Re: [PATCH v1 1/3] mm/hwpoison: find subpage in hugetlb HWPOISON list

From: Mike Kravetz
Date: Fri Jun 16 2023 - 19:35:44 EST


On 06/16/23 14:19, Jiaqi Yan wrote:
> On Sun, Jun 11, 2023 at 9:19 PM Naoya Horiguchi
> <naoya.horiguchi@xxxxxxxxx> wrote:
> >
> > On Fri, Jun 09, 2023 at 10:48:47PM -0700, Jiaqi Yan wrote:
> > > On Thu, May 25, 2023 at 5:28 PM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, May 22, 2023 at 7:43 PM HORIGUCHI NAOYA(堀口 直也)
> > > > <naoya.horiguchi@xxxxxxx> wrote:
> > > > >
> > > > > On Mon, May 22, 2023 at 11:22:49AM -0700, Jiaqi Yan wrote:
> > > > > > On Sun, May 21, 2023 at 9:50 PM HORIGUCHI NAOYA(堀口 直也)
> > > > > > <naoya.horiguchi@xxxxxxx> wrote:
> > > > > > >
> > > > > > > On Fri, May 19, 2023 at 03:42:14PM -0700, Mike Kravetz wrote:
> > > > > > > > On 05/19/23 13:54, Jiaqi Yan wrote:
> > > > > > > > > On Wed, May 17, 2023 at 4:53 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > On 05/17/23 16:09, Jiaqi Yan wrote:
> > > > > > > > > > > Adds the functionality to search a subpage's corresponding raw_hwp_page
> > > > > > > > > > > in hugetlb page's HWPOISON list. This functionality can also tell if a
> > > > > > > > > > > subpage is a raw HWPOISON page.
> > > > > > > > > > >
> > > > > > > > > > > Exports this functionality to be immediately used in the read operation
> > > > > > > > > > > for hugetlbfs.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx>
> > > > > > > > > > > ---
> > > > > > > > > > > include/linux/mm.h | 23 +++++++++++++++++++++++
> > > > > > > > > > > mm/memory-failure.c | 26 ++++++++++++++++----------
> > > > > > > > > > > 2 files changed, 39 insertions(+), 10 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > > > > > > index 27ce77080c79..f191a4119719 100644
> > > > > > > > > > > --- a/include/linux/mm.h
> > > > > > > > > > > +++ b/include/linux/mm.h
> > > > > > > > > >
> > > > > > > > > > Any reason why you decided to add the following to linux/mm.h instead of
> > > > > > > > > > linux/hugetlb.h? Since it is hugetlb specific I would have thought
> > > > > > > > > > hugetlb.h was more appropriate.
> > > > > > > > > >
> > > > > > > > > > > @@ -3683,6 +3683,29 @@ enum mf_action_page_type {
> > > > > > > > > > > */
> > > > > > > > > > > extern const struct attribute_group memory_failure_attr_group;
> > > > > > > > > > >
> > > > > > > > > > > +#ifdef CONFIG_HUGETLB_PAGE
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Struct raw_hwp_page represents information about "raw error page",
> > > > > > > > > > > + * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
> > > > > > > > > > > + */
> > > > > > > > > > > +struct raw_hwp_page {
> > > > > > > > > > > + struct llist_node node;
> > > > > > > > > > > + struct page *page;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > +static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > > > > > > > > > > +{
> > > > > > > > > > > + return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Given @subpage, a raw page in a hugepage, find its location in @folio's
> > > > > > > > > > > + * _hugetlb_hwpoison list. Return NULL if @subpage is not in the list.
> > > > > > > > > > > + */
> > > > > > > > > > > +struct raw_hwp_page *find_raw_hwp_page(struct folio *folio,
> > > > > > > > > > > + struct page *subpage);
> > > > > > > > > > > +#endif
> > > > > > > > > > > +
> > > > > > > > > > > #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> > > > > > > > > > > extern void clear_huge_page(struct page *page,
> > > > > > > > > > > unsigned long addr_hint,
> > > > > > > > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > > > > > > > > index 5b663eca1f29..c49e6c2d1f07 100644
> > > > > > > > > > > --- a/mm/memory-failure.c
> > > > > > > > > > > +++ b/mm/memory-failure.c
> > > > > > > > > > > @@ -1818,18 +1818,24 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
> > > > > > > > > > > #endif /* CONFIG_FS_DAX */
> > > > > > > > > > >
> > > > > > > > > > > #ifdef CONFIG_HUGETLB_PAGE
> > > > > > > > > > > -/*
> > > > > > > > > > > - * Struct raw_hwp_page represents information about "raw error page",
> > > > > > > > > > > - * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
> > > > > > > > > > > - */
> > > > > > > > > > > -struct raw_hwp_page {
> > > > > > > > > > > - struct llist_node node;
> > > > > > > > > > > - struct page *page;
> > > > > > > > > > > -};
> > > > > > > > > > >
> > > > > > > > > > > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > > > > > > > > > > +struct raw_hwp_page *find_raw_hwp_page(struct folio *folio,
> > > > > > > > > > > + struct page *subpage)
> > > > > > > > > > > {
> > > > > > > > > > > - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > > > > > > > > > > + struct llist_node *t, *tnode;
> > > > > > > > > > > + struct llist_head *raw_hwp_head = raw_hwp_list_head(folio);
> > > > > > > > > > > + struct raw_hwp_page *hwp_page = NULL;
> > > > > > > > > > > + struct raw_hwp_page *p;
> > > > > > > > > > > +
> > > > > > > > > > > + llist_for_each_safe(tnode, t, raw_hwp_head->first) {
> > > > > > > > > >
> > > > > > > > > > IIUC, in rare error cases a hugetlb page can be poisoned WITHOUT a
> > > > > > > > > > raw_hwp_list. This is indicated by the hugetlb page specific flag
> > > > > > > > > > RawHwpUnreliable or folio_test_hugetlb_raw_hwp_unreliable().
> > > > > > > > > >
> > > > > > > > > > Looks like this routine does not consider that case. Seems like it should
> > > > > > > > > > always return the passed subpage if folio_test_hugetlb_raw_hwp_unreliable()
> > > > > > > > > > is true?
> > > > > > > > >
> > > > > > > > > Thanks for catching this. I wonder should this routine consider
> > > > > > > > > RawHwpUnreliable or should the caller do.
> > > > > > > > >
> > > > > > > > > find_raw_hwp_page now returns raw_hwp_page* in the llist entry to
> > > > > > > > > caller (valid one at the moment), but once RawHwpUnreliable is set,
> > > > > > > > > all the raw_hwp_page in the llist will be kfree(), and the returned
> > > > > > > > > value becomes dangling pointer to caller (if the caller holds that
> > > > > > > > > caller long enough). Maybe returning a bool would be safer to the
> > > > > > > > > caller? If the routine returns bool, then checking RawHwpUnreliable
> > > > > > > > > can definitely be within the routine.
> > > > > > > >
> > > > > > > > I think the check for RawHwpUnreliable should be within this routine.
> > > > > > > > Looking closer at the code, I do not see any way to synchronize this.
> > > > > > > > It looks like manipulation in the memory-failure code would be
> > > > > > > > synchronized via the mf_mutex. However, I do not see how traversal and
> > > > > > > > freeing of the raw_hwp_list called from __update_and_free_hugetlb_folio
> > > > > > > > is synchronized against memory-failure code modifying the list.
> > > > > > > >
> > > > > > > > Naoya, can you provide some thoughts?
>
> Hi Mike,
>
> Now looking again this, I think concurrent adding and deleting are
> fine with each other and with themselves, because raw_hwp_list is
> lock-less llist.

Correct.

> As for synchronizing traversal with adding and deleting, I wonder is
> it a good idea to make __update_and_free_hugetlb_folio hold
> hugetlb_lock before it folio_clear_hugetlb_hwpoison(which traverse +
> delete raw_hwp_list)? In hugetlb, get_huge_page_for_hwpoison already
> takes hugetlb_lock; it seems to me __update_and_free_hugetlb_folio is
> missing the lock.

I do not think the lock is needed. However, while looking more closely
at this I think I discovered another issue.
This is VERY subtle.
Perhaps Naoya can help verify if my reasoning below is correct.

In __update_and_free_hugetlb_folio we are not operating on a hugetlb page.
Why is this?
Before calling update_and_free_hugetlb_folio we call remove_hugetlb_folio.
The purpose of remove_hugetlb_folio is to remove the huge page from the
list AND compound page destructor indicating this is a hugetlb page is changed.
This is all done while holding the hugetlb lock. So, the test for
folio_test_hugetlb(folio) is false.

We have technically a compound non-hugetlb page with a non-null raw_hwp_list.

Important note: at this time we have not reallocated vmemmap pages if
hugetlb page was vmemmap optimized. That is done later in
__update_and_free_hugetlb_folio.

The 'good news' is that after this point get_huge_page_for_hwpoison will
not recognize this as a hugetlb page, so nothing will be added to the
list. There is no need to worry about entries being added to the list
during traversal.

The 'bad news' is that if we get a memory error at this time we will
treat it as a memory error on a regular compound page. So,
TestSetPageHWPoison(p) in memory_failure() may try to write a read only
struct page. :(
--
Mike Kravetz