Re: [PATCH] ksm: delay the check of splitting compound pages

From: xu
Date: Tue Nov 14 2023 - 22:11:39 EST


>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 7efcc68ccc6e..c952fe5d9e43 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
>> tree_rmap_item =
>> unstable_tree_search_insert(rmap_item, page, &tree_page);
>> if (tree_rmap_item) {
>> - bool split;
>> -
>> kpage = try_to_merge_two_pages(rmap_item, page,
>> tree_rmap_item, tree_page);
>> - /*
>> - * If both pages we tried to merge belong to the same compound
>> - * page, then we actually ended up increasing the reference
>> - * count of the same compound page twice, and split_huge_page
>> - * failed.
>> - * Here we set a flag if that happened, and we use it later to
>> - * try split_huge_page again. Since we call put_page right
>> - * afterwards, the reference count will be correct and
>> - * split_huge_page should succeed.
>> - */
>
>I'm curious, why can't we detect that ahead of time and keep only a
>single reference? Why do we need the backup code? Anything I am missing?

I don't know the original reason, better ask Claudio Imbrenda <imbrenda@xxxxxxxxxxxxxxxxxx>.
Maybe because doing detection that ahead of time will break several funtions' semantic,
such as try_to_merge_two_pages(), try_to_merge_with_ksm_page() and try_to_merge_one_page()

Adding the backup code don't change the old code and fixing the old problem, it's good.

>
>> - split = PageTransCompound(page)
>> - && compound_head(page) == compound_head(tree_page);
>> - put_page(tree_page);
>> if (kpage) {
>> + put_page(tree_page);
>> /*
>> * The pages were successfully merged: insert new
>> * node in the stable tree and add both rmap_items.
>> @@ -2271,7 +2257,25 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
>> break_cow(tree_rmap_item);
>> break_cow(rmap_item);
>> }
>> - } else if (split) {
>> + } else {
>> + bool split;
>> + /*
>> + * If both pages we tried to merge belong to the same compound
>> + * page, then we actually ended up increasing the reference
>> + * count of the same compound page twice, and split_huge_page
>> + * failed.
>> + * Here we set a flag if that happened, and we use it later to
>> + * try split_huge_page again. Since we call put_page right
>> + * afterwards, the reference count will be correct and
>> + * split_huge_page should succeed.
>> + */
>> +
>> + split = PageTransCompound(page)
>> + && compound_head(page) == compound_head(tree_page);
>
>Would
>
>split = page_folio(page) == page_folio(tree_page);
>
>do the trick? No need to mess with compound pages.

In terms of function correctness, it should work correctly because here 'page' and 'tree_page' are never
the same page, which is guaranteed by unstable_tree_search_insert(). But it's not very intuitive, maybe
ww need to add some comment.