Re: [PATCH v6 00/11] KVM: arm64: Add support for FEAT_TLBIRANGE

From: Raghavendra Rao Ananta
Date: Fri Jul 14 2023 - 22:02:40 EST


On Fri, Jul 14, 2023 at 5:54 PM Raghavendra Rao Ananta
<rananta@xxxxxxxxxx> wrote:
>
> In certain code paths, KVM/ARM currently invalidates the entire VM's
> page-tables instead of just invalidating a necessary range. For example,
> when collapsing a table PTE to a block PTE, instead of iterating over
> each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
> flush all the entries. This is inefficient since the guest would have
> to refill the TLBs again, even for the addresses that aren't covered
> by the table entry. The performance impact would scale poorly if many
> addresses in the VM is going through this remapping.
>
> For architectures that implement FEAT_TLBIRANGE, KVM can replace such
> inefficient paths by performing the invalidations only on the range of
> addresses that are in scope. This series tries to achieve the same in
> the areas of stage-2 map, unmap and write-protecting the pages.
>
> As suggested by Oliver in the original v5 of the series [1], I'm
> reposting the series by including v2 of David Matlack's 'KVM: Add a
> common API for range-based TLB invalidation' series [2].
>
> Patches 1-4 includes David M.'s patches 1, 2, 6, and 7 from [2].
>
> Patch-5 refactors the core arm64's __flush_tlb_range() to be used by
> other entities.
>
> Patch-6,7 adds a range-based TLBI mechanism for KVM (VHE and nVHE).
>
> Patch-8 implements the kvm_arch_flush_remote_tlbs_range() for arm64.
>
> Patch-9 aims to flush only the memslot that undergoes a write-protect,
> instead of the entire VM.
>
> Patch-10 operates on stage2_try_break_pte() to use the range based
> TLBI instructions when collapsing a table entry. The map path is the
> immediate consumer of this when KVM remaps a table entry into a block.
>
> Patch-11 modifies the stage-2 unmap path in which, if the system
> supports
> FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table.
> walk. Instead it's done in one go after the entire walk is finished.
>
> The series is based off of upstream v6.5-rc1.
>
> The performance evaluation was done on a hardware that supports
> FEAT_TLBIRANGE, on a VHE configuration, using a modified
> kvm_page_table_test.
> The modified version updates the guest code in the ADJUST_MAPPINGS case
> to not only access this page but also to access up to 512 pages
> backwards
> for every new page it iterates through. This is done to test the effect
> of TLBI misses after KVM has handled a fault.
>
> The series captures the impact in the map and unmap paths as described
> above.
>
> $ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i
>
> +--------+------------------------------+------------------------------+
> | mem_sz | ADJUST_MAPPINGS (s) | Unmap VM (s) |
> | (GB) | Baseline | Baseline + series | Baseline | Baseline + series |
> +--------+----------|-------------------+------------------------------+
> | 1 | 3.33 | 3.22 | 0.009 | 0.005 |
> | 2 | 7.39 | 7.32 | 0.012 | 0.006 |
> | 4 | 13.49 | 10.50 | 0.017 | 0.008 |
> | 8 | 21.60 | 21.50 | 0.027 | 0.011 |
> | 16 | 57.02 | 43.63 | 0.046 | 0.018 |
> | 32 | 95.92 | 83.26 | 0.087 | 0.030 |
> | 64 | 199.57 | 165.14 | 0.146 | 0.055 |
> | 128 | 423.65 | 349.37 | 0.280 | 0.100 |
> +--------+----------+-------------------+----------+-------------------+
>
> $ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i
>
> +--------+------------------------------+
> | vCPUs | ADJUST_MAPPINGS (s) |
> | | Baseline | Baseline + series |
> +--------+----------|-------------------+
> | 1 | 111.44 | 114.63 |
> | 2 | 102.88 | 74.64 |
> | 4 | 134.83 | 98.78 |
> | 8 | 98.81 | 95.01 |
> | 16 | 127.41 | 99.05 |
> | 32 | 105.35 | 91.75 |
> | 64 | 201.13 | 163.63 |
> | 128 | 423.65 | 349.37 |
> +--------+----------+-------------------+
>
> For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
> 2M hugepages, the series sees an average improvement of ~15%. For
> unmapping 2M hugepages, we see a gain of 2x to 3x.
>
> $ kvm_page_table_test -m 2 -b $i
>
> +--------+------------------------------+
> | mem_sz | Unmap VM (s) |
> | (GB) | Baseline | Baseline + series |
> +--------+------------------------------+
> | 1 | 0.54 | 0.13 |
> | 2 | 1.07 | 0.25 |
> | 4 | 2.10 | 0.47 |
> | 8 | 4.19 | 0.92 |
> | 16 | 8.35 | 1.92 |
> | 32 | 16.66 | 3.61 |
> | 64 | 32.36 | 7.62 |
> | 128 | 64.65 | 14.39 |
> +--------+----------+-------------------+
>
> The series sees an average gain of 4x when the guest backed by
> PAGE_SIZE (4K) pages.
>
> Other testing:
> - Booted on x86_64 and ran KVM selftests.
> - Build tested for MIPS and RISCV architectures against defconfig.
>
> Cc: David Matlack <dmatlack@xxxxxxxxxx>
>
> v6:
This should've been 'v5 (RESEND)' with the link:
https://lore.kernel.org/all/20230621175002.2832640-1-rananta@xxxxxxxxxx/

- Raghavendra
> Thanks, Gavin for the suggestions:
> - Adjusted the comment on patch-2 to align with the code.
> - Fixed checkpatch.pl warning on patch-5.
>
> v5:
> https://lore.kernel.org/all/20230606192858.3600174-1-rananta@xxxxxxxxxx/
> Thank you, Marc and Oliver for the comments
> - Introduced a helper, kvm_tlb_flush_vmid_range(), to handle
> the decision of using range-based TLBI instructions or
> invalidating the entire VMID, rather than depending on
> __kvm_tlb_flush_vmid_range() for it.
> - kvm_tlb_flush_vmid_range() splits the range-based invalidations
> if the requested range exceeds MAX_TLBI_RANGE_PAGES.
> - All the users in need of invalidating the TLB upon a range
> now depends on kvm_tlb_flush_vmid_range() rather than directly
> on __kvm_tlb_flush_vmid_range().
> - stage2_unmap_defer_tlb_flush() introduces a WARN_ON() to
> track if there's any change in TLBIRANGE or FWB support
> during the unmap process as the features are based on
> alternative patching and the TLBI operations solely depend
> on this check.
> - Corrected an incorrect hunk being present on v4's patch-3.
> - Updated the patches changelog and code comments as per the
> suggestions.
>
> v4:
> https://lore.kernel.org/all/20230519005231.3027912-1-rananta@xxxxxxxxxx/
> Thanks again, Oliver for all the comments
> - Updated the __kvm_tlb_flush_vmid_range() implementation for
> nVHE to adjust with the modfied __tlb_switch_to_guest() that
> accepts a new 'bool nsh' arg.
> - Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed
> the 'skip_flush' argument.
> - Defined stage2_unmap_defer_tlb_flush() to check if the PTE
> flushes can be deferred during the unmap table walk. It's
> being called from stage2_unmap_put_pte() and
> kvm_pgtable_stage2_unmap().
> - Got rid of the 'struct stage2_unmap_data'.
>
> v3:
> https://lore.kernel.org/all/20230414172922.812640-1-rananta@xxxxxxxxxx/
> Thanks, Oliver for all the suggestions.
> - The core flush API (__kvm_tlb_flush_vmid_range()) now checks if
> the system support FEAT_TLBIRANGE or not, thus elimiating the
> redundancy in the upper layers.
> - If FEAT_TLBIRANGE is not supported, the implementation falls
> back to invalidating all the TLB entries with the VMID, instead
> of doing an iterative flush for the range.
> - The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP
> if the system doesn't implement FEAT_TLBIRANGE. It depends on
> __kvm_tlb_flush_vmid_range() to do take care of the decisions
> and return 0 regardless of the underlying feature support.
> - __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to
> calculate the 'stride'. Instead, it always assumes PAGE_SIZE.
> - Fast unmap path is eliminated. Instead, the existing unmap walker
> is modified to skip the TLBIs during the walk, and do it all at
> once after the walk, using the range-based instructions.
>
> v2:
> https://lore.kernel.org/all/20230206172340.2639971-1-rananta@xxxxxxxxxx/
> - Rebased the series on top of David Matlack's series for common
> TLB invalidation API[1].
> - Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
> the support introduced by [1].
> - Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
> only the current memslot after write-protect.
> - Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
> argument to calculate the 'stride' instead of just using PAGE_SIZE.
> - Split the patch that introduces the range-based TLBI to KVM and the
> implementation of IPA-based invalidation into its own patches.
> - Dropped the patch that tries to optimize the mmu notifiers paths.
> - Rename the function kvm_table_pte_flush() to
> kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
> flush. [Oliver]
> - Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
> pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]
>
> v1:
> https://lore.kernel.org/all/20230109215347.3119271-1-rananta@xxxxxxxxxx/
>
> Thank you.
> Raghavendra
>
> [1]: https://lore.kernel.org/all/ZIrONR6cSegiK1e2@xxxxxxxxx/
> [2]:
> https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-1-dmatlack@xxxxxxxxxx/
>
> David Matlack (4):
> KVM: Rename kvm_arch_flush_remote_tlb() to
> kvm_arch_flush_remote_tlbs()
> KVM: arm64: Use kvm_arch_flush_remote_tlbs()
> KVM: Allow range-based TLB invalidation from common code
> KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
>
> Raghavendra Rao Ananta (7):
> arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
> KVM: arm64: Implement __kvm_tlb_flush_vmid_range()
> KVM: arm64: Define kvm_tlb_flush_vmid_range()
> KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
> KVM: arm64: Flush only the memslot after write-protect
> KVM: arm64: Invalidate the table entries upon a range
> KVM: arm64: Use TLBI range-based intructions for unmap
>
> arch/arm64/include/asm/kvm_asm.h | 3 +
> arch/arm64/include/asm/kvm_host.h | 6 ++
> arch/arm64/include/asm/kvm_pgtable.h | 10 +++
> arch/arm64/include/asm/tlbflush.h | 109 ++++++++++++++-------------
> arch/arm64/kvm/Kconfig | 1 -
> arch/arm64/kvm/arm.c | 6 --
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++
> arch/arm64/kvm/hyp/nvhe/tlb.c | 30 ++++++++
> arch/arm64/kvm/hyp/pgtable.c | 90 +++++++++++++++++++---
> arch/arm64/kvm/hyp/vhe/tlb.c | 23 ++++++
> arch/arm64/kvm/mmu.c | 15 +++-
> arch/mips/include/asm/kvm_host.h | 4 +-
> arch/mips/kvm/mips.c | 12 +--
> arch/riscv/kvm/mmu.c | 6 --
> arch/x86/include/asm/kvm_host.h | 7 +-
> arch/x86/kvm/mmu/mmu.c | 25 ++----
> arch/x86/kvm/mmu/mmu_internal.h | 3 -
> arch/x86/kvm/x86.c | 2 +-
> include/linux/kvm_host.h | 20 +++--
> virt/kvm/Kconfig | 3 -
> virt/kvm/kvm_main.c | 35 +++++++--
> 21 files changed, 290 insertions(+), 131 deletions(-)
>
> --
> 2.41.0.455.g037347b96a-goog
>