Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

From: Tarun Sahu
Date: Thu Jun 08 2023 - 06:03:46 EST


Hi Mike,

Please find my comments inline.

Mike Kravetz <mike.kravetz@xxxxxxxxxx> writes:

> On 06/06/23 10:32, Tarun Sahu wrote:
>>
>> Hi Mike,
>>
>> Thanks for your inputs.
>> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
>
> Hi Tarun,
>
> Just a few more comments/questions.
>
> On 05/15/23 22:38, Tarun Sahu wrote:
>> folio_set_order(folio, 0) is used in kernel at two places
>> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
>> Currently, It is called to clear out the folio->_folio_nr_pages and
>> folio->_folio_order.
>>
>> For __destroy_compound_gigantic_folio:
>> In past, folio_set_order(folio, 0) was needed because page->mapping used
>> to overlap with _folio_nr_pages and _folio_order. So if these fields were
>> left uncleared during freeing gigantic hugepages, they were causing
>> "BUG: bad page state" due to non-zero page->mapping. Now, After
>> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
>> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
>> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> I believe the same logic/reasoning as above also applies to
> __prep_compound_gigantic_folio.
> Why?
> In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> in the case of error. If __prep_compound_gigantic_folio fails, the caller
> will then call free_gigantic_folio() on the "gigantic page". However, it is
> not really a gigantic at this point in time, and we are simply calling
> cma_release() or free_contig_range().
> The end result is that I do not believe the existing call to
> folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> required. ???
No, there is a difference. IIUC, __destroy_compound_gigantic_folio
explicitly reset page->mapping for each page of compound page which
makes sure, even if in future some fields of struct page/folio overlaps
with page->mapping, it won't cause `BUG: bad page state` error. But If we
just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
without moving folio_set_order(folio, order), this will cause extra
maintenance overhead to track if page->_folio_order overlaps with
page->mapping everytime struct page fields are changed. As in case of
overlapping page->mapping will be non-zero. IMHO, To avoid it,
moving the folio_set_order(folio, order) after all error checks are
done on tail pages. So, _folio_order is either set on success and not
set in case of error. (which is the original proposal). But for
folio_set_head, I agree the way you suggested below.

WDYT?

>
> If my reasoning above is correct, then we could just have one patch to
> remove the folio_set_order(folio, 0) calls and remove special casing for
> order 0 in folio_set_order.
>
> However, I still believe your restructuring of __prep_compound_gigantic_folio,
> is of value. I do not believe there is an issue as questioned by Matthew. My
> reasoning has been stated previously. We could make changes like the following
> to retain the same order of operations in __prep_compound_gigantic_folio and
> totally avoid Matthew's question. Totally untested.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ea24718db4af..a54fee663cb1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> int nr_pages = 1 << order;
> struct page *p;
>
> - __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> +
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> * on the head page when they need know if put_page() is needed
> * after get_user_pages().
> */
> - if (i != 0) /* head page cleared above */
> + if (i != 0) /* head page cleared below */
> __ClearPageReserved(p);
> /*
> * Subtle and very unlikely
> @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> } else {
> VM_BUG_ON_PAGE(page_count(p), p);
> }
> - if (i != 0)
> +
> + if (i == 0) {
> + __folio_clear_reserved(folio);
> + __folio_set_head(folio);
> + folio_set_order(folio, order);
With folio_set_head, I agree to this, But does not feel good with
folio_set_order as per my above reasoning. WDYT?

> + } else {
> set_compound_head(p, &folio->page);
> + }
> }
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> return false;
> }
>
>
>>
>> struct page {
>> ...
>> struct address_space * mapping; /* 24 8 */
>> ...
>> }
>>
>> struct folio {
>> ...
>> union {
>> struct {
>> long unsigned int _flags_1; /* 64 8 */
>> long unsigned int _head_1; /* 72 8 */
>> unsigned char _folio_dtor; /* 80 1 */
>> unsigned char _folio_order; /* 81 1 */
>>
>> /* XXX 2 bytes hole, try to pack */
>>
>> atomic_t _entire_mapcount; /* 84 4 */
>> atomic_t _nr_pages_mapped; /* 88 4 */
>> atomic_t _pincount; /* 92 4 */
>> unsigned int _folio_nr_pages; /* 96 4 */
>> }; /* 64 40 */
>> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
>> }
>> ...
>> }
>
> I do not think the copy of page/folio definitions adds much value to the
> commit message.
Yeah, Will remove it.
>
> --
> Mike Kravetz