Re: [PATCH v2 2/2] hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles

From: Mike Kravetz
Date: Wed Jul 19 2023 - 20:21:15 EST


On 07/19/23 17:02, James Houghton wrote:
> On Tue, Jul 18, 2023 at 9:47 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> >
> > On 07/18/23 09:31, James Houghton wrote:
> > > On Mon, Jul 17, 2023 at 5:50 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> > > > + * destructor of all pages on list.
> > > > + */
> > > > + if (clear_dtor) {
> > > > + spin_lock_irq(&hugetlb_lock);
> > > > + list_for_each_entry(page, list, lru)
> > > > + __clear_hugetlb_destructor(h, page_folio(page));
> > > > + spin_unlock_irq(&hugetlb_lock);
> > > > }
> > >
> > > I'm not too familiar with this code, but the above block seems weird
> > > to me. If we successfully allocated the vmemmap for *any* folio, we
> > > clear the hugetlb destructor for all the folios? I feel like we should
> > > only be clearing the hugetlb destructor for all folios if the vmemmap
> > > allocation succeeded for *all* folios. If the code is functionally
> > > correct as is, I'm a little bit confused why we need `clear_dtor`; it
> > > seems like this function doesn't really need it. (I could have some
> > > huge misunderstanding here.)
> > >
> >
> > Yes, it is a bit strange.
> >
> > I was thinking this has to also handle the case where hugetlb vmemmap
> > optimization is off system wide. In that case, clear_dtor would never
> > be set and there is no sense in ever walking the list and calling
> > __clear_hugetlb_destructor() would would be a NOOP in this case. Think
> > of the case where there are TBs of hugetlb pages.
> >
> > That is one of the reasons I made __clear_hugetlb_destructor() check
> > for the need to modify the destructor. The other reason is in the
> > dissolve_free_huge_page() code path where we allocate vmemmap. I
> > suppose, there could be an explicit call to __clear_hugetlb_destructor()
> > in dissolve_free_huge_page. But, I thought it might be better if
> > we just handled both cases here.
> >
> > My thinking is that the clear_dtor boolean would tell us if vmemmap was
> > restored for ANY hugetlb page. I am aware that just because vmemmap was
> > allocated for one page, does not mean that it was allocated for others.
> > However, in the common case where hugetlb vmemmap optimization is on
> > system wide, we would have allocated vmemmap for all pages on the list
> > and would need to clear the destructor for them all.
> >
> > So, clear_dtor is really just an optimization for the
> > hugetlb_free_vmemmap=off case. Perhaps that is just over thinking and
> > not a useful miro-optimization.
>
> Ok I think I understand; I think the micro-optimization is fine to
> add. But I think there's still a bug here:
>
> If we have two vmemmap-optimized hugetlb pages and restoring the page
> structs for one of them fails, that page will end up with the
> incorrect dtor (add_hugetlb_folio will set it properly, but then we
> clear it afterwards because clear_dtor was set).
>
> What do you think?

add_hugetlb_folio() will call enqueue_hugetlb_folio() which will move
the folio from the existing list we are processing to the hugetlb free
list. Therefore, the page for which we could not restore vmemmap is not
on the list for that 'if (clear_dtor)' block of code.

--
Mike Kravetz