Re: [PATCH] [RFC] mm: migrate: rcu stalls because of invalid swap cache entries

From: Matthew Wilcox
Date: Tue Nov 21 2023 - 11:14:01 EST


On Tue, Nov 21, 2023 at 06:00:40PM +0530, Charan Teja Kalla wrote:
> The below race on a folio between reclaim and migration exposed a bug
> of not populating the swap cache with proper folio resulting into the
> rcu stalls:

Thank you for figuring out this race and describing it so well.
It explains a few things I've seen, at least potentially.

What would you think to this? I think a better fix would be to
fix the swap cache to user multi-order entries, but I would like to
see this backportable!

diff --git a/mm/migrate.c b/mm/migrate.c
index d9d2b9432e81..2d67ca47d2e2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping,
int dirty;
int expected_count = folio_expected_refs(mapping, folio) + extra_count;
long nr = folio_nr_pages(folio);
+ long entries, i;

if (!mapping) {
/* Anonymous page without mapping */
@@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping,
folio_set_swapcache(newfolio);
newfolio->private = folio_get_private(folio);
}
+ entries = nr;
} else {
VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+ entries = 1;
}

/* Move dirty while page refs frozen and newpage not yet exposed */
@@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping,
folio_set_dirty(newfolio);
}

- xas_store(&xas, newfolio);
+ /* Swap cache still stores N entries instead of a high-order entry */
+ for (i = 0; i < entries; i++) {
+ xas_store(&xas, newfolio);
+ xas_next(&xas);
+ }

/*
* Drop cache reference from old page by unfreezing