Re:

From: Ryan Roberts
Date: Thu May 11 2023 - 09:14:08 EST


My appologies for the noise: A blank line between Cc and Subject has broken the
subject and grouping in lore.

Please Ignore this, I will resend.


On 11/05/2023 13:58, Ryan Roberts wrote:
> Date: Thu, 11 May 2023 11:38:28 +0100
> Subject: [PATCH v1 0/5] Encapsulate PTE contents from non-arch code
>
> Hi All,
>
> This series improves the encapsulation of pte entries by disallowing non-arch
> code from directly dereferencing pte_t pointers. Instead code must use a new
> helper, `pte_t ptep_deref(pte_t *ptep)`. By default, this helper does a direct
> dereference of the pointer, so generated code should be exactly the same. But
> it's presence sets us up for arch code being able to override the default to
> "virtualize" the ptes without needing to maintain a shadow table.
>
> I intend to take advantage of this for arm64 to enable use of its "contiguous
> bit" to coalesce multiple ptes into a single tlb entry, reducing pressure and
> improving performance. I have an RFC for the first part of this work at [1]. The
> cover letter there also explains the second part, which this series is enabling.
>
> I intend to post an RFC for the contpte changes in due course, but it would be
> good to get the ball rolling on this enabler.
>
> There are 2 reasons that I need the encapsulation:
>
> - Prevent leaking the arch-private PTE_CONT bit to the core code. If the core
> code reads a pte that contains this bit, it could end up calling
> set_pte_at() with the bit set which would confuse the implementation. So we
> can always clear PTE_CONT in ptep_deref() (and ptep_get()) to avoid a leaky
> abstraction.
> - Contiguous ptes have a single access and dirty bit for the contiguous range.
> So we need to "mix-in" those bits when the core is dereferencing a pte that
> lies in the contig range. There is code that dereferences the pte then takes
> different actions based on access/dirty (see e.g. write_protect_page()).
>
> While ptep_get() and ptep_get_lockless() already exist, both of them are
> implemented using READ_ONCE() by default. While we could use ptep_get() instead
> of the new ptep_deref(), I didn't want to risk performance regression.
> Alternatively, all call sites that currently use ptep_get() that need the
> lockless behaviour could be upgraded to ptep_get_lockless() and ptep_get() could
> be downgraded to a simple dereference. That would be cleanest, but is a much
> bigger (and likely error prone) change because all the arch code would need to
> be updated for the new definitions of ptep_get().
>
> The series is split up as follows:
>
> patchs 1-2: Fix bugs where code was _setting_ ptes directly, rather than using
> set_pte_at() and friends.
> patch 3: Fix highmem unmapping issue I spotted while doing the work.
> patch 4: Introduce the new ptep_deref() helper with default implementation.
> patch 5: Convert all direct dereferences to use ptep_deref().
>
> [1] https://lore.kernel.org/linux-mm/20230414130303.2345383-1-ryan.roberts@xxxxxxx/
>
> Thanks,
> Ryan
>
>
> Ryan Roberts (5):
> mm: vmalloc must set pte via arch code
> mm: damon must atomically clear young on ptes and pmds
> mm: Fix failure to unmap pte on highmem systems
> mm: Add new ptep_deref() helper to fully encapsulate pte_t
> mm: ptep_deref() conversion
>
> .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +-
> drivers/misc/sgi-gru/grufault.c | 2 +-
> drivers/vfio/vfio_iommu_type1.c | 7 +-
> drivers/xen/privcmd.c | 2 +-
> fs/proc/task_mmu.c | 33 +++---
> fs/userfaultfd.c | 6 +-
> include/linux/hugetlb.h | 2 +-
> include/linux/mm_inline.h | 2 +-
> include/linux/pgtable.h | 13 ++-
> kernel/events/uprobes.c | 2 +-
> mm/damon/ops-common.c | 18 ++-
> mm/damon/ops-common.h | 4 +-
> mm/damon/paddr.c | 6 +-
> mm/damon/vaddr.c | 14 ++-
> mm/filemap.c | 2 +-
> mm/gup.c | 21 ++--
> mm/highmem.c | 12 +-
> mm/hmm.c | 2 +-
> mm/huge_memory.c | 4 +-
> mm/hugetlb.c | 2 +-
> mm/hugetlb_vmemmap.c | 6 +-
> mm/kasan/init.c | 9 +-
> mm/kasan/shadow.c | 10 +-
> mm/khugepaged.c | 24 ++--
> mm/ksm.c | 22 ++--
> mm/madvise.c | 6 +-
> mm/mapping_dirty_helpers.c | 4 +-
> mm/memcontrol.c | 4 +-
> mm/memory-failure.c | 6 +-
> mm/memory.c | 103 +++++++++---------
> mm/mempolicy.c | 6 +-
> mm/migrate.c | 14 ++-
> mm/migrate_device.c | 14 ++-
> mm/mincore.c | 2 +-
> mm/mlock.c | 6 +-
> mm/mprotect.c | 8 +-
> mm/mremap.c | 2 +-
> mm/page_table_check.c | 4 +-
> mm/page_vma_mapped.c | 26 +++--
> mm/pgtable-generic.c | 2 +-
> mm/rmap.c | 32 +++---
> mm/sparse-vmemmap.c | 8 +-
> mm/swap_state.c | 4 +-
> mm/swapfile.c | 16 +--
> mm/userfaultfd.c | 4 +-
> mm/vmalloc.c | 11 +-
> mm/vmscan.c | 14 ++-
> virt/kvm/kvm_main.c | 9 +-
> 48 files changed, 302 insertions(+), 236 deletions(-)
>
> --
> 2.25.1
>