Re: [PATCH] mm/huge_memory: fix swap entry values of tail pages of THP

From: Matthew Wilcox
Date: Tue Feb 13 2024 - 03:54:21 EST


On Tue, Feb 13, 2024 at 02:18:10PM +0530, Charan Teja Kalla wrote:
> An anon THP page is first added to swap cache before reclaiming it.
> Initially, each tail page contains the proper swap entry value(stored in
> ->private field) which is filled from add_to_swap_cache(). After
> migrating the THP page sitting on the swap cache, only the swap entry of
> the head page is filled(see folio_migrate_mapping()).
>
> Now when this page is tried to split(one case is when this page is again
> migrated, see migrate_pages()->try_split_thp()), the tail pages
> ->private is not stored with proper swap entry values. When this tail
> page is now try to be freed, as part of it delete_from_swap_cache() is
> called which operates on the wrong swap cache index and eventually
> replaces the wrong swap cache index with shadow/NULL value, frees the
> page.
>
> This leads to the state with a swap cache containing the freed page.
> This issue can manifest in many forms and the most common thing observed
> is the rcu stall during the swapin (see mapping_get_entry()).
>
> On the recent kernels, this issues is indirectly getting fixed with the
> series[1], to be specific[2].
>
> When tried to back port this series, it is observed many merge
> conflicts and also seems dependent on many other changes. As backporting
> to LTS branches is not a trivial one, the similar change from [2] is
> picked as a fix.
>
> [1] https://lore.kernel.org/all/20230821160849.531668-1-david@xxxxxxxxxx/
> [2] https://lore.kernel.org/all/20230821160849.531668-5-david@xxxxxxxxxx/

I am deeply confused by this commit message.

Are you saying there is a problem in current HEAD which this fixes, or
are you saying that this problem has already been fixed, and this patch
is for older kernels?

> Closes: https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@xxxxxxxxxxx/
> Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
> Cc: <stable@xxxxxxxxxxxxxxx> # see patch description, applicable to <=6.1
> Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
> ---
> mm/huge_memory.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5957794..cc5273f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2477,6 +2477,8 @@ static void __split_huge_page_tail(struct page *head, int tail,
> if (!folio_test_swapcache(page_folio(head))) {
> VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
> page_tail->private = 0;
> + } else {
> + set_page_private(page_tail, (unsigned long)head->private + tail);
> }
>
> /* Page flags must be visible before we make the page non-compound. */
> --
> 2.7.4
>