Re: [PATCH v3 4/4] mm: swap: Swap-out small-sized THP without splitting

From: Barry Song
Date: Sat Nov 04 2023 - 01:50:08 EST


>> #define __HAVE_ARCH_PREPARE_TO_SWAP
>> -static inline int arch_prepare_to_swap(struct page *page)
>> -{
>> - if (system_supports_mte())
>> - return mte_save_tags(page);
>> - return 0;
>> -}
>> +#define arch_prepare_to_swap arch_prepare_to_swap
>> +extern int arch_prepare_to_swap(struct page *page);
>
> I think it would be better to modify this API to take a folio explicitly. The
> caller already has the folio.

agree. that was actually what i thought I should change while making this rfc,
though i didn't do it.

>> +int arch_prepare_to_swap(struct page *page)
>> +{
>> + if (system_supports_mte()) {
>> + struct folio *folio = page_folio(page);
>> + long i, nr = folio_nr_pages(folio);
>> + for (i = 0; i < nr; i++)
>> + return mte_save_tags(folio_page(folio, i));
>
> This will return after saving the first page of the folio! You will need to add
> each page in a loop, and if you get an error at any point, you will need to
> remove the pages that you already added successfully, by calling
> arch_swap_invalidate_page() as far as I can see. Steven can you confirm?

right. oops...

>
>> + }
>> + return 0;
>> +}
>> +
>> +void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>> +{
>> + if (system_supports_mte()) {
>> + long i, nr = folio_nr_pages(folio);
>> + for (i = 0; i < nr; i++)
>> + mte_restore_tags(entry, folio_page(folio, i));
>
> swap-in currently doesn't support large folios - everything is a single page
> folio. So this isn't technically needed. But from the API POV, it seems
> reasonable to make this change - except your implementation is broken. You are
> currently setting every page in the folio to use the same tags as the first
> page. You need to increment the swap entry for each page.

one case is that we have a chance to "swapin" a folio which is still in swapcache
and hasn't been dropped yet. i mean the process's ptes have been swap entry, but
the large folio is still in swapcache. in this case, we will hit the swapcache
while swapping in, thus we are handling a large folio. in this case, it seems
we are restoring tags multiple times? i mean, if large folio has 16 basepages,
for each page fault of each base page, we are restoring a large folio, then
for 16 page faults, we are duplicating the restore.
any thought to handle this situation? should we move arch_swap_restore() to take
page rather than folio since swapin only supports basepage at this moment.

> Thanks,
> Ryan

Thanks
Barry