[PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag

From: Will Deacon
Date: Thu Jan 14 2021 - 13:02:12 EST


Hi again folks,

This is the third version of the patches I previously posted here:

v1: https://lore.kernel.org/r/20201209163950.8494-1-will@xxxxxxxxxx
v2: https://lore.kernel.org/r/20210108171517.5290-1-will@xxxxxxxxxx

The patches allow architectures to opt-in at runtime for faultaround
mappings to be created as 'old' instead of 'young'. Although there have
been previous attempts at this, they failed either because the decision
was deferred to userspace [1] or because it was done unconditionally and
shown to regress benchmarks for particular architectures [2].

Minor changes since v2 include:

* Update commit messages
* Remove repeated word 'from from' in a comment
* Restore 'vmf->flags' in filemap_map_pages()

The major additions are in the five RFC patches at the end of the
series, which attempt to implement a suggestion from Linus to split up
'struct vm_fault', clearly separating the mutable and immutable fields
in the data structure. I used Coccinelle to do most of the mechanical
work, but I also ran into some tricky problems along the way:

1. 'vmf->flags' is modified on the '->page_mkwrite()' path so I couldn't
find a satisfactory way to move it to the new const structure. I toyed
with getting rid of FAULT_FLAG_[MK]WRITE completely and just tracking
these as bools, but there's also a weird piece of code in
vmw_bo_vm_mkwrite() which modifies FAULT_FLAG_ALLOW_RETRY, so I gave
up and left the 'flags' field alone.

2. I had to perform terrifying surgery on __collapse_huge_page_swapin()
and, in doing so, I'm a bit wary about the initialisation of 'pgoff',
as it isn't updated along with the address (this matches the old code).

3. vmf_insert_pfn_pmd() and friends take both a 'struct vm_fault' _and_
a 'bool write'. I have left them alone, but that FAULT_FLAG_WRITE is
causing trouble again.

4. Turns out 'struct vm_fault' is popular, so the diffstat is bloody
massive.

Anyway, be good to hear any thoughts on this lot, particular with regards
to my comments above. I've also pushed the series here:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=faultaround

Cheers

Will

[1] https://www.spinics.net/lists/linux-mm/msg143831.html
[2] 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")

Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Vinayak Menon <vinmenon@xxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: <kernel-team@xxxxxxxxxxx>

--->8

Kirill A. Shutemov (1):
mm: Cleanup faultaround and finish_fault() codepaths

Will Deacon (7):
mm: Allow architectures to request 'old' entries when prefaulting
arm64: mm: Implement arch_wants_old_prefaulted_pte()
mm: Separate fault info out of 'struct vm_fault'
mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT
mm: Avoid modifying vmf.info.address in __collapse_huge_page_swapin()
mm: Use static initialisers for 'info' field of 'struct vm_fault'
mm: Mark 'info' field of 'struct vm_fault' as 'const'

arch/arm64/include/asm/pgtable.h | 12 +-
arch/arm64/kernel/vdso.c | 4 +-
arch/powerpc/kvm/book3s_64_vio.c | 6 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +-
arch/powerpc/kvm/book3s_xive_native.c | 13 +-
arch/powerpc/platforms/cell/spufs/file.c | 16 +-
arch/s390/kernel/vdso.c | 4 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/entry/vdso/vma.c | 22 +-
arch/x86/kernel/cpu/sgx/encl.c | 4 +-
drivers/char/agp/alpha-agp.c | 2 +-
drivers/char/mspec.c | 6 +-
drivers/dax/device.c | 37 +-
drivers/dma-buf/heaps/cma_heap.c | 6 +-
drivers/dma-buf/udmabuf.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +-
drivers/gpu/drm/armada/armada_gem.c | 6 +-
drivers/gpu/drm/drm_gem_shmem_helper.c | 8 +-
drivers/gpu/drm/drm_vm.c | 18 +-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 10 +-
drivers/gpu/drm/gma500/framebuffer.c | 4 +-
drivers/gpu/drm/gma500/gem.c | 8 +-
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 8 +-
drivers/gpu/drm/msm/msm_gem.c | 11 +-
drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 +-
drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 20 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 4 +-
drivers/gpu/drm/tegra/gem.c | 6 +-
drivers/gpu/drm/ttm/ttm_bo_vm.c | 10 +-
drivers/gpu/drm/vc4/vc4_bo.c | 2 +-
drivers/gpu/drm/vgem/vgem_drv.c | 6 +-
drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 12 +-
drivers/hsi/clients/cmt_speech.c | 2 +-
drivers/hwtracing/intel_th/msu.c | 8 +-
drivers/infiniband/core/uverbs_main.c | 10 +-
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 6 +-
drivers/misc/cxl/context.c | 9 +-
drivers/misc/ocxl/context.c | 10 +-
drivers/misc/ocxl/sysfs.c | 8 +-
drivers/misc/sgi-gru/grumain.c | 4 +-
drivers/scsi/cxlflash/ocxl_hw.c | 6 +-
drivers/scsi/cxlflash/superpipe.c | 2 +-
drivers/scsi/sg.c | 4 +-
drivers/target/target_core_user.c | 6 +-
drivers/uio/uio.c | 6 +-
drivers/usb/mon/mon_bin.c | 4 +-
drivers/vfio/pci/vfio_pci.c | 2 +-
drivers/vfio/pci/vfio_pci_nvlink2.c | 8 +-
drivers/vhost/vdpa.c | 6 +-
drivers/video/fbdev/core/fb_defio.c | 14 +-
drivers/xen/privcmd-buf.c | 5 +-
drivers/xen/privcmd.c | 4 +-
fs/9p/vfs_file.c | 2 +-
fs/afs/write.c | 2 +-
fs/btrfs/inode.c | 4 +-
fs/ceph/addr.c | 6 +-
fs/dax.c | 53 +--
fs/ext2/file.c | 6 +-
fs/ext4/file.c | 6 +-
fs/ext4/inode.c | 4 +-
fs/f2fs/file.c | 8 +-
fs/fuse/dax.c | 2 +-
fs/fuse/file.c | 4 +-
fs/gfs2/file.c | 8 +-
fs/iomap/buffered-io.c | 2 +-
fs/kernfs/file.c | 4 +-
fs/nfs/file.c | 2 +-
fs/nilfs2/file.c | 2 +-
fs/ocfs2/mmap.c | 8 +-
fs/orangefs/file.c | 2 +-
fs/orangefs/inode.c | 4 +-
fs/proc/vmcore.c | 4 +-
fs/ubifs/file.c | 2 +-
fs/userfaultfd.c | 17 +-
fs/xfs/xfs_file.c | 18 +-
fs/zonefs/super.c | 6 +-
include/linux/huge_mm.h | 6 +-
include/linux/mm.h | 21 +-
include/linux/pgtable.h | 11 +
include/trace/events/fs_dax.h | 28 +-
ipc/shm.c | 2 +-
kernel/events/core.c | 12 +-
kernel/relay.c | 4 +-
lib/test_hmm.c | 4 +-
mm/filemap.c | 208 +++++++---
mm/huge_memory.c | 57 +--
mm/hugetlb.c | 6 +-
mm/internal.h | 4 +-
mm/khugepaged.c | 39 +-
mm/memory.c | 452 +++++++++------------
mm/mmap.c | 6 +-
mm/shmem.c | 16 +-
mm/swap_state.c | 19 +-
mm/swapfile.c | 13 +-
samples/vfio-mdev/mbochs.c | 10 +-
security/selinux/selinuxfs.c | 4 +-
sound/core/pcm_native.c | 8 +-
sound/usb/usx2y/us122l.c | 4 +-
sound/usb/usx2y/usX2Yhwdep.c | 8 +-
sound/usb/usx2y/usx2yhwdeppcm.c | 4 +-
virt/kvm/kvm_main.c | 12 +-
104 files changed, 821 insertions(+), 730 deletions(-)

--
2.30.0.284.gd98b1dd5eaa7-goog