Re: [PATCH v2] mm/migrate: put dest folio on deferred split list if source was there.

From: Matthew Wilcox
Date: Mon Mar 11 2024 - 23:46:14 EST


On Mon, Mar 11, 2024 at 03:58:48PM -0400, Zi Yan wrote:
> @@ -1168,6 +1172,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> folio_lock(src);
> }
> locked = true;
> + if (folio_test_large_rmappable(src) &&
> + !list_empty(&src->_deferred_list)) {
> + struct deferred_split *ds_queue = get_deferred_split_queue(src);
> +
> + spin_lock(&ds_queue->split_queue_lock);
> + ds_queue->split_queue_len--;
> + list_del_init(&src->_deferred_list);
> + spin_unlock(&ds_queue->split_queue_lock);
> + old_page_state |= PAGE_WAS_ON_DEFERRED_LIST;
> + }

I have a few problems with this ...

Trivial: your whitespace is utterly broken. You can't use a single tab
for both indicating control flow change and for line-too-long.

Slightly more important: You're checking list_empty outside the lock
(which is fine in order to avoid unnecessarily acquiring the lock),
but you need to re-check it inside the lock in case of a race. And you
didn't mark it as data_race(), so KMSAN will whinge.

Much more important: You're doing this with a positive refcount, which
breaks the (undocumented) logic in deferred_split_scan() that a folio
with a positive refcount will not be removed from the list.

Maximally important: Wer shouldn't be doing any of this! This folio is
on the deferred split list. We shouldn't be migrating it as a single
entity; we should be splitting it now that we're in a context where we
can do the right thing and split it. Documentation/mm/transhuge.rst
is clear that we don't split it straight away due to locking context.
Splitting it on migration is clearly the right thing to do.

If splitting fails, we should just fail the migration; splitting fails
due to excess references, and if the source folio has excess references,
then migration would fail too.