Re: [PATCH v4 4/8] hugetlb: create remove_hugetlb_page() to separate functionality

From: Mike Kravetz
Date: Tue Apr 06 2021 - 12:50:24 EST


On 4/6/21 2:56 AM, Michal Hocko wrote:
> On Mon 05-04-21 16:00:39, Mike Kravetz wrote:
>> The new remove_hugetlb_page() routine is designed to remove a hugetlb
>> page from hugetlbfs processing. It will remove the page from the active
>> or free list, update global counters and set the compound page
>> destructor to NULL so that PageHuge() will return false for the 'page'.
>> After this call, the 'page' can be treated as a normal compound page or
>> a collection of base size pages.
>>
>> update_and_free_page no longer decrements h->nr_huge_pages{_node} as
>> this is performed in remove_hugetlb_page. The only functionality
>> performed by update_and_free_page is to free the base pages to the lower
>> level allocators.
>>
>> update_and_free_page is typically called after remove_hugetlb_page.
>>
>> remove_hugetlb_page is to be called with the hugetlb_lock held.
>>
>> Creating this routine and separating functionality is in preparation for
>> restructuring code to reduce lock hold times. This commit should not
>> introduce any changes to functionality.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>
> Btw. I would prefer to reverse the ordering of this and Oscar's
> patchset. This one is a bug fix which might be interesting for stable
> backports while Oscar's work can be looked as a general functionality
> improvement.

Ok, that makes sense.

Andrew, can we make this happen? It would require removing Oscar's
series until it can be modified to work on top of this.
There is only one small issue with this series as it originally went
into mmotm. There is a missing conversion of spin_lock to spin_lock_irq
in patch 7. In addition, there are some suggested changes from Oscar to
this patch. I do not think they are necessary, but I could make those
as well. Let me know what I can do to help make this happen.

>> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>> /*
>> * Freed from under us. Drop new_page too.
>> */
>> + remove_hugetlb_page(h, new_page, false);
>> update_and_free_page(h, new_page);
>> goto unlock;
>> } else if (page_count(old_page)) {
>> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>> * Someone has grabbed the page, try to isolate it here.
>> * Fail with -EBUSY if not possible.
>> */
>> + remove_hugetlb_page(h, new_page, false);
>> update_and_free_page(h, new_page);
>> spin_unlock(&hugetlb_lock);
>> if (!isolate_huge_page(old_page, list))
>
> the page is not enqued anywhere here so remove_hugetlb_page would blow
> when linked list debugging is enabled.

I also thought this would be an issue. However, INIT_LIST_HEAD would
have been called for the page so,

static inline void INIT_LIST_HEAD(struct list_head *list)
{
WRITE_ONCE(list->next, list);
list->prev = list;
}

The debug checks of concern in __list_del_entry_valid are:

CHECK_DATA_CORRUPTION(prev->next != entry,
"list_del corruption. prev->next should be %px, but was
%px\n",
entry, prev->next) ||
CHECK_DATA_CORRUPTION(next->prev != entry,
"list_del corruption. next->prev should be %px, but was
%px\n",
entry, next->prev))

Since, all pointers point back to the list(head) the check passes. My
track record with the list routines is not so good, so I actually
forced list_del after INIT_LIST_HEAD with list debugging enabled and did
not enounter any issues.

Going forward, I agree it would be better to perhaps add a list_empty
check so that things do not blow up if the debugging code is changed.

At one time I also thought of splitting the functionality in
alloc_fresh_huge_page and prep_new_huge_page so that it would only
allocate/prep the page but not increment nr_huge_pages. A new routine
would be used to increment the counter when it was actually put into use.
I thought this could be used when doing bulk adjustments in set_max_huge_pages
but the benefit would be minimal. This seems like something that would
be useful in Oscar's alloc_and_dissolve_huge_page routine.
--
Mike Kravetz