Re: [PATCH v2] mm: migrate: record the mlocked page status to remove unnecessary lru drain

From: Baolin Wang
Date: Fri Oct 20 2023 - 23:31:16 EST




On 10/20/2023 12:48 PM, Hugh Dickins wrote:
On Fri, 20 Oct 2023, Baolin Wang wrote:

When doing compaction, I found the lru_add_drain() is an obvious hotspot
when migrating pages. The distribution of this hotspot is as follows:
- 18.75% compact_zone
- 17.39% migrate_pages
- 13.79% migrate_pages_batch
- 11.66% migrate_folio_move
- 7.02% lru_add_drain
+ 7.02% lru_add_drain_cpu
+ 3.00% move_to_new_folio
1.23% rmap_walk
+ 1.92% migrate_folio_unmap
+ 3.20% migrate_pages_sync
+ 0.90% isolate_migratepages

The lru_add_drain() was added by commit c3096e6782b7 ("mm/migrate:
__unmap_and_move() push good newpage to LRU") to drain the newpage to LRU
immediately, to help to build up the correct newpage->mlock_count in
remove_migration_ptes() for mlocked pages. However, if there are no mlocked
pages are migrating, then we can avoid this lru drain operation, especailly
for the heavy concurrent scenarios.

So we can record the source pages' mlocked status in migrate_folio_unmap(),
and only drain the lru list when the mlocked status is set in migrate_folio_move().
In addition, the page was already isolated from lru when migrating, so checking
the mlocked status is stable by folio_test_mlocked() in migrate_folio_unmap().

After this patch, I can see the hotpot of the lru_add_drain() is gone:
- 9.41% migrate_pages_batch
- 6.15% migrate_folio_move
- 3.64% move_to_new_folio
+ 1.80% migrate_folio_extra
+ 1.70% buffer_migrate_folio
+ 1.41% rmap_walk
+ 0.62% folio_add_lru
+ 3.07% migrate_folio_unmap

Meanwhile, the compaction latency shows some improvements when running
thpscale:
base patched
Amean fault-both-1 1131.22 ( 0.00%) 1112.55 * 1.65%*
Amean fault-both-3 2489.75 ( 0.00%) 2324.15 * 6.65%*
Amean fault-both-5 3257.37 ( 0.00%) 3183.18 * 2.28%*
Amean fault-both-7 4257.99 ( 0.00%) 4079.04 * 4.20%*
Amean fault-both-12 6614.02 ( 0.00%) 6075.60 * 8.14%*
Amean fault-both-18 10607.78 ( 0.00%) 8978.86 * 15.36%*
Amean fault-both-24 14911.65 ( 0.00%) 11619.55 * 22.08%*
Amean fault-both-30 14954.67 ( 0.00%) 14925.66 * 0.19%*
Amean fault-both-32 16654.87 ( 0.00%) 15580.31 * 6.45%*


Seems a sensible change with good results (I'll conceal how little of
the stats I understand, I expect everyone else understands them: in my
naivety, I'm mainly curious why rmap_walk's 1.23% didn't get a + on it).

TBH, I also don't know why the rmap_walk didn't get a + on it, let me check it again.

Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
Chages from v1:
- Use separate flags in __migrate_folio_record() to avoid to pack flags
in each call site per Ying.
---
mm/migrate.c | 47 +++++++++++++++++++++++++++++++++++------------
1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 125194f5af0f..fac96139dbba 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1027,22 +1027,39 @@ union migration_ptr {
struct anon_vma *anon_vma;
struct address_space *mapping;
};
+
+enum {
+ PAGE_WAS_MAPPED = 1 << 0,
+ PAGE_WAS_MLOCKED = 1 << 1,
+};
+

I was whispering to myself "I bet someone will suggest BIT()";
and indeed that someone has turned out to be Huang, Ying.

Sure.


static void __migrate_folio_record(struct folio *dst,
- unsigned long page_was_mapped,
+ unsigned int page_was_mapped,
+ unsigned int page_was_mlocked,
struct anon_vma *anon_vma)
{
union migration_ptr ptr = { .anon_vma = anon_vma };
+ unsigned long page_flags = 0;

Huang, Ying preferred a different name, me too: old_page_state?

OK, sounds better to me.


+
+ if (page_was_mapped)
+ page_flags |= PAGE_WAS_MAPPED;
+ if (page_was_mlocked)
+ page_flags |= PAGE_WAS_MLOCKED;

What's annoying me about the patch is all this mix of page_was_mapped and
page_was_mlocked variables, then the old_page_state bits. Can't it be
done with PAGE_WAS_ bits in old_page_state throughout, without any
page_was_mapped and page_was_mlocked variables?

Yes, good point. Let me try it. Thanks for your comments.