[PATCH v1 0/3] Speed up boot with faster linear map creation

From: Ryan Roberts
Date: Tue Mar 26 2024 - 06:15:10 EST


Hi All,

It turns out that creating the linear map can take a significant proportion of
the total boot time, especially when rodata=full. And a large portion of the
time it takes to create the linear map is issuing TLBIs. This series reworks the
kernel pgtable generation code to significantly reduce the number of TLBIs. See
each patch for details.

The below shows the execution time of map_mem() across a couple of different
systems with different RAM configurations. We measure after applying each patch
and show the improvement relative to base (v6.9-rc1):

| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
base | 151 (0%) | 2191 (0%) | 8990 (0%) | 17443 (0%)
no-cont-remap | 77 (-49%) | 429 (-80%) | 1753 (-80%) | 3796 (-78%)
no-alloc-remap | 77 (-49%) | 375 (-83%) | 1532 (-83%) | 3366 (-81%)
lazy-unmap | 63 (-58%) | 330 (-85%) | 1312 (-85%) | 2929 (-83%)

This series applies on top of v6.9-rc1. All mm selftests pass. I haven't yet
tested all VA size configs (although I don't anticipate any issues); I'll do
this as part of followup.

Thanks,
Ryan


Ryan Roberts (3):
arm64: mm: Don't remap pgtables per- cont(pte|pmd) block
arm64: mm: Don't remap pgtables for allocate vs populate
arm64: mm: Lazily clear pte table mappings from fixmap

arch/arm64/include/asm/fixmap.h | 5 +-
arch/arm64/include/asm/mmu.h | 8 +
arch/arm64/include/asm/pgtable.h | 4 -
arch/arm64/kernel/cpufeature.c | 10 +-
arch/arm64/mm/fixmap.c | 11 +
arch/arm64/mm/mmu.c | 364 +++++++++++++++++++++++--------
include/linux/pgtable.h | 8 +
7 files changed, 307 insertions(+), 103 deletions(-)

--
2.25.1