Re: [PATCH] mm/huge_memory: fix swap entry values of tail pages of THP

From: David Hildenbrand
Date: Tue Feb 13 2024 - 04:25:38 EST


On 13.02.24 09:48, Charan Teja Kalla wrote:
An anon THP page is first added to swap cache before reclaiming it.
Initially, each tail page contains the proper swap entry value(stored in
->private field) which is filled from add_to_swap_cache(). After

This is a stable-only fix? In that case, it make sense to indicate that in the patch subject [PATCH STABLE vX.Y].

But it's always odd to have stable-only fixes, the docs [1] don't cover that (maybe they should? or I missed it).

[1] https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst


So we are migrating a THP that was added to the swapcache. Do you have a reproducer?

migrating the THP page sitting on the swap cache, only the swap entry of
the head page is filled(see folio_migrate_mapping()).

Now when this page is tried to split(one case is when this page is again
migrated, see migrate_pages()->try_split_thp()), the tail pages
->private is not stored with proper swap entry values. When this tail
page is now try to be freed, as part of it delete_from_swap_cache() is
called which operates on the wrong swap cache index and eventually
replaces the wrong swap cache index with shadow/NULL value, frees the
page.

But what if we don't split the THP after migration. Is there anything else that can go wrong?

It's sufficient to look where upstream calls page_swap_entry():

For example, do_swap_page() will never be able to make progress, because the swap entry of the page does not correspond to the swap entry in the PTE? It can easily fault on a swap PTE that refers a THP subpage in the swapcache.

So unless I am missing something, only fixing this up during the split is insufficient. You have to fix it up during migration.


This leads to the state with a swap cache containing the freed page.
This issue can manifest in many forms and the most common thing observed
is the rcu stall during the swapin (see mapping_get_entry()).

On the recent kernels, this issues is indirectly getting fixed with the
series[1], to be specific[2].

When tried to back port this series, it is observed many merge
conflicts and also seems dependent on many other changes. As backporting
to LTS branches is not a trivial one, the similar change from [2] is
picked as a fix.

The fix is in

commit cfeed8ffe55b37fa10286aaaa1369da00cb88440
Author: David Hildenbrand <david@xxxxxxxxxx>
Date: Mon Aug 21 18:08:46 2023 +0200

mm/swap: stop using page->private on tail pages for THP_SWAP


[1] https://lore.kernel.org/all/20230821160849.531668-1-david@xxxxxxxxxx/
[2] https://lore.kernel.org/all/20230821160849.531668-5-david@xxxxxxxxxx/

Closes: https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@xxxxxxxxxxx/
Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Cc: <stable@xxxxxxxxxxxxxxx> # see patch description, applicable to <=6.1
Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>

3417013e0d18 went into 5.16.

cfeed8ffe55b3 went into 6.6.

So only 6.1 is affected.


Isn't there a way to bite the bullet and backport that series to 6.1 instead?

--
Cheers,

David / dhildenb