Re: [PATCH v1 7/9] mm/mmu_gather: add __tlb_remove_folio_pages()

From: David Hildenbrand
Date: Tue Jan 30 2024 - 04:33:46 EST


On 30.01.24 10:21, Ryan Roberts wrote:
On 29/01/2024 14:32, David Hildenbrand wrote:
Add __tlb_remove_folio_pages(), which will remove multiple consecutive
pages that belong to the same large folio, instead of only a single
page. We'll be using this function when optimizing unmapping/zapping of
large folios that are mapped by PTEs.

We're using the remaining spare bit in an encoded_page to indicate that
the next enoced page in an array contains actually shifted "nr_pages".
Teach swap/freeing code about putting multiple folio references, and
delayed rmap handling to remove page ranges of a folio.

This extension allows for still gathering almost as many small folios
as we used to (-1, because we have to prepare for a possibly bigger next
entry), but still allows for gathering consecutive pages that belong to the
same large folio.

Note that we don't pass the folio pointer, because it is not required for
now. Further, we don't support page_size != PAGE_SIZE, it won't be
required for simple PTE batching.

We have to provide a separate s390 implementation, but it's fairly
straight forward.

Another, more invasive and likely more expensive, approach would be to
use folio+range or a PFN range instead of page+nr_pages. But, we should
do that consistently for the whole mmu_gather. For now, let's keep it
simple and add "nr_pages" only.

Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
arch/s390/include/asm/tlb.h | 17 +++++++++++
include/asm-generic/tlb.h | 8 +++++
include/linux/mm_types.h | 20 ++++++++++++
mm/mmu_gather.c | 61 +++++++++++++++++++++++++++++++------
mm/swap.c | 12 ++++++--
mm/swap_state.c | 12 ++++++--
6 files changed, 116 insertions(+), 14 deletions(-)

diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 48df896d5b79..abfd2bf29e9e 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -26,6 +26,8 @@ void __tlb_remove_table(void *_table);
static inline void tlb_flush(struct mmu_gather *tlb);
static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
struct page *page, bool delay_rmap, int page_size);
+static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
+ struct page *page, unsigned int nr_pages, bool delay_rmap);
#define tlb_flush tlb_flush
#define pte_free_tlb pte_free_tlb
@@ -52,6 +54,21 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
return false;
}
+static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
+ struct page *page, unsigned int nr_pages, bool delay_rmap)
+{
+ struct encoded_page *encoded_pages[] = {
+ encode_page(page, ENCODED_PAGE_BIT_NR_PAGES),
+ encode_nr_pages(nr_pages),
+ };
+
+ VM_WARN_ON_ONCE(delay_rmap);
+ VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1));
+
+ free_pages_and_swap_cache(encoded_pages, ARRAY_SIZE(encoded_pages));
+ return false;
+}
+
static inline void tlb_flush(struct mmu_gather *tlb)
{
__tlb_flush_mm_lazy(tlb->mm);
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 2eb7b0d4f5d2..428c3f93addc 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -69,6 +69,7 @@
*
* - tlb_remove_page() / __tlb_remove_page()
* - tlb_remove_page_size() / __tlb_remove_page_size()
+ * - __tlb_remove_folio_pages()
*
* __tlb_remove_page_size() is the basic primitive that queues a page for
* freeing. __tlb_remove_page() assumes PAGE_SIZE. Both will return a
@@ -78,6 +79,11 @@
* tlb_remove_page() and tlb_remove_page_size() imply the call to
* tlb_flush_mmu() when required and has no return value.
*
+ * __tlb_remove_folio_pages() is similar to __tlb_remove_page(), however,
+ * instead of removing a single page, remove the given number of consecutive
+ * pages that are all part of the same (large) folio: just like calling
+ * __tlb_remove_page() on each page individually.
+ *
* - tlb_change_page_size()
*
* call before __tlb_remove_page*() to set the current page-size; implies a
@@ -262,6 +268,8 @@ struct mmu_gather_batch {
extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
bool delay_rmap, int page_size);
+bool __tlb_remove_folio_pages(struct mmu_gather *tlb, struct page *page,
+ unsigned int nr_pages, bool delay_rmap);
#ifdef CONFIG_SMP
/*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1b89eec0d6df..198662b7a39a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -226,6 +226,15 @@ struct encoded_page;
/* Perform rmap removal after we have flushed the TLB. */
#define ENCODED_PAGE_BIT_DELAY_RMAP 1ul
+/*
+ * The next item in an encoded_page array is the "nr_pages" argument, specifying
+ * the number of consecutive pages starting from this page, that all belong to
+ * the same folio. For example, "nr_pages" corresponds to the number of folio
+ * references that must be dropped. If this bit is not set, "nr_pages" is
+ * implicitly 1.
+ */
+#define ENCODED_PAGE_BIT_NR_PAGES 2ul

nit: Perhaps this should be called ENCODED_PAGE_BIT_NR_PAGES_NEXT? There are a
couple of places where you check for this bit on the current entry and advance
to the next. So the "_NEXT" might make things clearer?

Yes, makes sense, thanks for the suggestion.

--
Cheers,

David / dhildenb