Re: [RFC 0/6] migrate_pages(): batch TLB flushing

From: Hesham Almatary
Date: Wed Nov 02 2022 - 10:13:44 EST



On 11/2/2022 3:14 AM, Huang, Ying wrote:
Hesham Almatary <hesham.almatary@xxxxxxxxxx> writes:

On 9/27/2022 12:21 PM, haoxin wrote:
Hi, Huang

( 2022/9/21 H2:06, Huang Ying S:
From: "Huang, Ying" <ying.huang@xxxxxxxxx>

Now, migrate_pages() migrate pages one by one, like the fake code as
follows,

? for each page
?? unmap
?? flush TLB
?? copy
?? restore map

If multiple pages are passed to migrate_pages(), there are
opportunities to batch the TLB flushing and copying. That is, we can
change the code to something as follows,

? for each page
?? unmap
? for each page
?? flush TLB
? for each page
?? copy
? for each page
?? restore map

The total number of TLB flushing IPI can be reduced considerably. And
we may use some hardware accelerator such as DSA to accelerate the
page copying.

So in this patch, we refactor the migrate_pages() implementation and
implement the TLB flushing batching. Base on this, hardware
accelerated page copying can be implemented.

If too many pages are passed to migrate_pages(), in the naive batched
implementation, we may unmap too many pages at the same time. The
possibility for a task to wait for the migrated pages to be mapped
again increases. So the latency may be hurt. To deal with this
issue, the max number of pages be unmapped in batch is restricted to
no more than HPAGE_PMD_NR. That is, the influence is at the same
level of THP migration.

We use the following test to measure the performance impact of the
patchset,

On a 2-socket Intel server,

- Run pmbench memory accessing benchmark

- Run `migratepages` to migrate pages of pmbench between node 0 and
? node 1 back and forth.

As the pmbench can not run on arm64 machine, so i use lmbench instead.
I test case like this: (i am not sure whether it is reasonable,
but it seems worked)
./bw_mem -N10000 10000m rd &
time migratepages pid node0 node1

FYI, I have ported pmbench to AArch64 [1]. The project seems to be
abandoned on bitbucket,

I wonder if it makes sense to fork it elsewhere and push the pending PRs there.


[1] https://bitbucket.org/jisooy/pmbench/pull-requests/5
Maybe try to contact the original author with email firstly?

That's  a good idea. I'm not planning to fork/maintain it myself, but if anyone

is interested in doing so, I am happy to help out and submit PRs there.


Best Regards,
Huang, Ying

o/patch w/patch
real? 0m0.035s?? real? 0m0.024s
user? 0m0.000s?? user? 0m0.000s
sys? 0m0.035s??? sys? 0m0.024s

the migratepages time is reduced above 32%.

But there has a problem, i see the batch flush is called by
migrate_pages_batch
??try_to_unmap_flush
??? arch_tlbbatch_flush(&tlb_ubc->arch); // there batch flush really work.

But in arm64, the arch_tlbbatch_flush are not supported, becasue it
not support CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH yet.

So, the tlb batch flush means no any flush is did, it is a empty func.

Maybe this patch can help solve this problem.
https://lore.kernel.org/linux-arm-kernel/20220921084302.43631-1-yangyicong@xxxxxxxxxx/T/