Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

From: Mike Kravetz
Date: Tue Jun 06 2023 - 11:59:23 EST


On 06/06/23 10:32, Tarun Sahu wrote:
>
> Hi Mike,
>
> Thanks for your inputs.
> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?

Hi Tarun,

Just a few more comments/questions.

On 05/15/23 22:38, Tarun Sahu wrote:
> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.

I believe the same logic/reasoning as above also applies to
__prep_compound_gigantic_folio.
Why?
In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
in the case of error. If __prep_compound_gigantic_folio fails, the caller
will then call free_gigantic_folio() on the "gigantic page". However, it is
not really a gigantic at this point in time, and we are simply calling
cma_release() or free_contig_range().
The end result is that I do not believe the existing call to
folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
required. ???

If my reasoning above is correct, then we could just have one patch to
remove the folio_set_order(folio, 0) calls and remove special casing for
order 0 in folio_set_order.

However, I still believe your restructuring of __prep_compound_gigantic_folio,
is of value. I do not believe there is an issue as questioned by Matthew. My
reasoning has been stated previously. We could make changes like the following
to retain the same order of operations in __prep_compound_gigantic_folio and
totally avoid Matthew's question. Totally untested.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ea24718db4af..a54fee663cb1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
int nr_pages = 1 << order;
struct page *p;

- __folio_clear_reserved(folio);
- __folio_set_head(folio);
/* we rely on prep_new_hugetlb_folio to set the destructor */
- folio_set_order(folio, order);
+
for (i = 0; i < nr_pages; i++) {
p = folio_page(folio, i);

@@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
* on the head page when they need know if put_page() is needed
* after get_user_pages().
*/
- if (i != 0) /* head page cleared above */
+ if (i != 0) /* head page cleared below */
__ClearPageReserved(p);
/*
* Subtle and very unlikely
@@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
} else {
VM_BUG_ON_PAGE(page_count(p), p);
}
- if (i != 0)
+
+ if (i == 0) {
+ __folio_clear_reserved(folio);
+ __folio_set_head(folio);
+ folio_set_order(folio, order);
+ } else {
set_compound_head(p, &folio->page);
+ }
}
atomic_set(&folio->_entire_mapcount, -1);
atomic_set(&folio->_nr_pages_mapped, 0);
@@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
p = folio_page(folio, j);
__ClearPageReserved(p);
}
- folio_set_order(folio, 0);
__folio_clear_head(folio);
return false;
}


>
> struct page {
> ...
> struct address_space * mapping; /* 24 8 */
> ...
> }
>
> struct folio {
> ...
> union {
> struct {
> long unsigned int _flags_1; /* 64 8 */
> long unsigned int _head_1; /* 72 8 */
> unsigned char _folio_dtor; /* 80 1 */
> unsigned char _folio_order; /* 81 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> atomic_t _entire_mapcount; /* 84 4 */
> atomic_t _nr_pages_mapped; /* 88 4 */
> atomic_t _pincount; /* 92 4 */
> unsigned int _folio_nr_pages; /* 96 4 */
> }; /* 64 40 */
> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> }
> ...
> }

I do not think the copy of page/folio definitions adds much value to the
commit message.

--
Mike Kravetz