Re: [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD

From: Will Deacon
Date: Thu Oct 15 2020 - 06:55:58 EST


On Wed, Oct 14, 2020 at 12:53:07AM +0000, Kalesh Singh wrote:
> HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
> source and destination addresses are PMD-aligned.
>
> HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
> introduced this config did not enable it on arm64 at the time because
> of performance issues with flushing the TLB on every PMD move. These
> issues have since been addressed in more recent releases with
> improvements to the arm64 TLB invalidation and core mmu_gather code as
> Will Deacon mentioned in [2].
>
> From the data below, it can be inferred that there is approximately
> 8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.
>
> --------- Test Results ----------
>
> The following results were obtained on an arm64 device running a 5.4
> kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
> destination. The results from 10 iterations of the test are given below.
> All times are in nanoseconds.
>
> Control HAVE_MOVE_PMD
>
> 9220833 1247761
> 9002552 1219896
> 9254115 1094792
> 8725885 1227760
> 9308646 1043698
> 9001667 1101771
> 8793385 1159896
> 8774636 1143594
> 9553125 1025833
> 9374010 1078125
>
> 9100885.4 1134312.6 <-- Mean Time in nanoseconds
>
> Total mremap time for a 1GB sized PMD-aligned region drops from
> ~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).
>
> [1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@xxxxxxxxxx
> [2] https://www.mail-archive.com/linuxppc-dev@xxxxxxxxxxxxxxxx/msg140837.html
>
> Signed-off-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
> Changes in v4:
> - Add Kirill's Acked-by.

Argh, I thought we already enabled this for PMDs back in 2018! Looks like
that we forgot to actually do that after I improved the performance of
the TLB invalidation.

I'll pick this one patch up for 5.10.

Will