[PATCH 0/4] arm64: an optimization for AmpereOne

From: Huang Shijie
Date: Wed Nov 22 2023 - 04:29:44 EST


0) Background:
We found that AmpereOne benefits from aggressive prefetches when
using 4K page size.

1) This patch:
1.1) adds new WORKAROUND_AMPERE_AC03_PREFETCH capability.
1.2) uses MIDR_AMPERE1 to filter the processor.
1.3) uses alternative_if to alternative the code
for AmpereOne.
1.4) adds software prefetches for the specific loop.
Also add a macro add_prefetch.

2) Test result:
In hugetlb or tmpfs, We can get big seqential read performance improvement
up to 1.3x ~ 1.4x.


Huang Shijie (4):
extable: add __sort_main_extable
arm64: alternative: handle the kernel exception table
arm64: copy_template.S: add loop_for_copy_128_bytes macro
arm64: add software prefetches for AmpereOne

arch/arm64/Kconfig.platforms | 7 +++
arch/arm64/kernel/alternative.c | 18 +++++++
arch/arm64/kernel/cpu_errata.c | 9 ++++
arch/arm64/lib/copy_template.S | 87 +++++++++++++++++++++++----------
arch/arm64/tools/cpucaps | 1 +
include/linux/extable.h | 2 +
kernel/extable.c | 8 ++-
7 files changed, 105 insertions(+), 27 deletions(-)

--
2.40.1