[RFC PATCH 0/2] Invalidate secondary IOMMU TLB on permission upgrade

From: Alistair Popple
Date: Tue Jun 20 2023 - 07:18:58 EST


==========
Background
==========

The arm64 architecture specifies TLB permission bits may be cached and
therefore the TLB must be invalidated during permission upgrades. For
the CPU this currently occurs in the architecture specific
ptep_set_access_flags() routine.

Secondary TLBs such as implemented by the SMMU IOMMU match the CPU
architecture specification and may also cache permission bits and
require the same TLB invalidations. This may be achieved in one of two
ways.

Some SMMU implementations implement broadcast TLB maintenance
(BTM). This snoops CPU TLB invalidates and will invalidate any
secondary TLB at the same time as the CPU. However implementations are
not required to implement BTM.

Implementations without BTM rely on mmu notifier callbacks to send
explicit TLB invalidation commands to invalidate SMMU TLB. Therefore
either generic kernel code or architecture specific code needs to call
the mmu notifier on permission upgrade.

Currently that doesn't happen so devices will fault indefinitely when
writing to a PTE that was previously read-only as nothing invalidates
the SMMU TLB.

To fix that this series first renames the .invalidate_range() callback
to .invalidate_secondary_tlbs() as suggested by Jason and Sean to make
it clear this callback is only used for secondary TLBs. That was made
possible thanks to Sean's series [1] to remove KVM incorrect
usage. This series is currently in linux-next.

>From here there are several possible solutions for which I would like
some feedback on a preferred approach.

=========
Solutions
=========

1. Add a call to mmu_notifier_invalidate_secondary_tlbs() to the arm64
version of ptep_set_access_flags().

This is what this RFC series does as it is the simplest
solution. Arguably this call should be made by generic kernel code
though to catch other platforms that need it.

However only ARM64, IA64 and Sparc flush the TLB in
ptep_set_access_flags() and AFAIK only ARM64 has an IOMMU that uses
shared page-tables and is therefore the only platform affected by
this.

2. Add a call to mmu_notifier_invalidate_secondary_tlbs() to generic
kernel code.

The problem with this approach is generic kernel code has no way of
knowing if it can be skipped or not for a given IOMMU. That leads to
over invalidation and subsequent performance loss on the majority of
platforms that don't need it.

3. Implement a new set of notifier operations (eg. tlb_notifier_ops)
specifically for secondary TLBs with a range of operations that can be
called by generic kernel code for every PTE modification.

See [2] for a prototype implementation of this idea.

This solves the problems of (1) and (2) because an IOMMU would only
implement the operations it needs. It also keeps the layering nice as
theoretically there is no reason a secondary TLB has to follow the
main CPU architecture specification so is free to implement its own
operations (although I know of no hardware that does this).

However it adds complexity for dealing with a problem that only exists
on some implementations of a particular feature on one
architecture. For that reason I think (1) is the best path forward due
to simplicity but would appreciate any feedback here.

============
Other Issues
============

It is unclear if mmu_notifier_invalidate_secondary_tlbs() should be
called from mmu_notifier_range_end(). Currently it is, as an analysis
of existing code shows most code doesn't explicitly invalidate
secondary TLBs and relies on it being called as part of the end()
call.

The disadvantage of changing code to explicitly invalidate secondary
TLBs is generally it can't take advantage of IOMMU specific range
based TLB invalidation commands because explicit invalidations happen
one page at a time under PTL.

To solve that we could add secondary TLB invalidation calls to the TLB
batching code, but that adds complexity so I'm not sure it's worth it
but would appreciate feedback.

[1] - https://lore.kernel.org/all/20230602011518.787006-1-seanjc@xxxxxxxxxx/
[2] - https://lore.kernel.org/all/87h6rhw4i0.fsf@xxxxxxxxxx/

Alistair Popple (2):
mm_notifiers: Rename invalidate_range notifier
arm64: Notify on pte permission upgrades

arch/arm64/mm/fault.c | 7 +-
arch/arm64/mm/hugetlbpage.c | 9 +++-
drivers/iommu/amd/iommu_v2.c | 10 +--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 13 ++--
drivers/iommu/intel/svm.c | 8 +--
drivers/misc/ocxl/link.c | 8 +--
include/asm-generic/tlb.h | 2 +-
include/linux/mmu_notifier.h | 55 +++++++++---------
mm/huge_memory.c | 4 +-
mm/hugetlb.c | 10 +--
mm/mmu_notifier.c | 52 ++++++++++-------
mm/rmap.c | 42 +++++++-------
12 files changed, 125 insertions(+), 95 deletions(-)

base-commit: b16049b21162bb649cdd8519642a35972b7910fe
--
git-series 0.9.1