Re: [PATCH v1 1/3] mm/hwpoison: find subpage in hugetlb HWPOISON list

From: Mike Kravetz
Date: Sat Jun 17 2023 - 19:47:35 EST


On 06/16/23 19:18, Jiaqi Yan wrote:
> On Fri, Jun 16, 2023 at 4:35 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> > On 06/16/23 14:19, Jiaqi Yan wrote:
> > >
> > > Now looking again this, I think concurrent adding and deleting are
> > > fine with each other and with themselves, because raw_hwp_list is
> > > lock-less llist.
> >
> > Correct.
> >
> > > As for synchronizing traversal with adding and deleting, I wonder is
> > > it a good idea to make __update_and_free_hugetlb_folio hold
> > > hugetlb_lock before it folio_clear_hugetlb_hwpoison(which traverse +
> > > delete raw_hwp_list)? In hugetlb, get_huge_page_for_hwpoison already
> > > takes hugetlb_lock; it seems to me __update_and_free_hugetlb_folio is
> > > missing the lock.
> >
> > I do not think the lock is needed. However, while looking more closely
> > at this I think I discovered another issue.
> > This is VERY subtle.
> > Perhaps Naoya can help verify if my reasoning below is correct.
> >
> > In __update_and_free_hugetlb_folio we are not operating on a hugetlb page.
> > Why is this?
> > Before calling update_and_free_hugetlb_folio we call remove_hugetlb_folio.
> > The purpose of remove_hugetlb_folio is to remove the huge page from the
> > list AND compound page destructor indicating this is a hugetlb page is changed.
> > This is all done while holding the hugetlb lock. So, the test for
> > folio_test_hugetlb(folio) is false.
> >
> > We have technically a compound non-hugetlb page with a non-null raw_hwp_list.
> >
> > Important note: at this time we have not reallocated vmemmap pages if
> > hugetlb page was vmemmap optimized. That is done later in
> > __update_and_free_hugetlb_folio.
>
>
> >
> > The 'good news' is that after this point get_huge_page_for_hwpoison will
> > not recognize this as a hugetlb page, so nothing will be added to the
> > list. There is no need to worry about entries being added to the list
> > during traversal.
> >
> > The 'bad news' is that if we get a memory error at this time we will
> > treat it as a memory error on a regular compound page. So,
> > TestSetPageHWPoison(p) in memory_failure() may try to write a read only
> > struct page. :(
>
> At least I think this is an issue.
>
> Would it help if dissolve_free_huge_page doesn't unlock hugetlb_lock
> until update_and_free_hugetlb_folio is done, or basically until
> dissolve_free_huge_page is done?

Unfortunately, update_and_free_hugetlb_folio is designed to be called
without locks held. This is because we can not hold any locks while
allocating vmemmap pages.

I'll try to think of some way to restructure the code. IIUC, this is a
potential general issue, not just isolated to memory error handling.
--
Mike Kravetz

>
> TestSetPageHWPoison in memory_failure is called after
> try_memory_failure_hugetlb, and folio_test_hugetlb is tested within
> __get_huge_page_for_hwpoison, which is wrapped by the hugetlb_lock. So
> by the time dissolve_free_huge_page returns, subpages already go
> through hugetlb_vmemmap_restore and __destroy_compound_gigantic_folio
> and become non-compound raw pages (folios). Now
> folio_test_hugetlb(p)=false will be correct for memory_failure, and it
> can recover p as a dissolved non-hugetlb page.