[PATCH mm-unstable v1 00/20] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning)

From: David Hildenbrand
Date: Wed Nov 16 2022 - 05:31:49 EST


For now, we did not support reliable R/O long-term pinning in COW mappings.
That means, if we would trigger R/O long-term pinning in MAP_PRIVATE
mapping, we could end up pinning the (R/O-mapped) shared zeropage or a
pagecache page.

The next write access would trigger a write fault and replace the pinned
page by an exclusive anonymous page in the process page table; whatever the
process would write to that private page copy would not be visible by the
owner of the previous page pin: for example, RDMA could read stale data.
The end result is essentially an unexpected and hard-to-debug memory
corruption.

Some drivers tried working around that limitation by using
"FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now.
FOLL_WRITE would trigger a write fault, if required, and break COW before
pinning the page. FOLL_FORCE is required because the VMA might lack write
permissions, and drivers wanted to make that working as well, just like
one would expect (no write access, but still triggering a write access to
break COW).

However, that is not a practical solution, because
(1) Drivers that don't stick to that undocumented and debatable pattern
would still run into that issue. For example, VFIO only uses
FOLL_LONGTERM for R/O long-term pinning.
(2) Using FOLL_WRITE just to work around a COW mapping + page pinning
limitation is unintuitive. FOLL_WRITE would, for example, mark the
page softdirty or trigger uffd-wp, even though, there actually isn't
going to be any write access.
(3) The purpose of FOLL_FORCE is debug access, not access without lack of
VMA permissions by arbitrarty drivers.

So instead, make R/O long-term pinning work as expected, by breaking COW
in a COW mapping early, such that we can remove any FOLL_FORCE usage from
drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE).
More details in patch #8.

Patches #1--#3 add COW tests for non-anonymous pages.
Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in
COW mappings.
Patch #8 implements reliable R/O long-term pinning in COW mappings
Patches #9--#19 remove any FOLL_FORCE usage from drivers.
Patch #20 renames FOLL_FORCE to FOLL_PTRACE.

I'm refraining from CCing all driver/arch maintainers on the whole patch
set, but only CC them on the cover letter and the applicable patch
(I know, I know, someone is always unhappy ... sorry).

RFC -> v1:
* Use term "ptrace" instead of "debuggers" in patch descriptions
* Added ACK/Tested-by
* "mm/frame-vector: remove FOLL_FORCE usage"
-> Adjust description
* "mm: rename FOLL_FORCE to FOLL_PTRACE"
-> Added

David Hildenbrand (20):
selftests/vm: anon_cow: prepare for non-anonymous COW tests
selftests/vm: cow: basic COW tests for non-anonymous pages
selftests/vm: cow: R/O long-term pinning reliability tests for
non-anon pages
mm: add early FAULT_FLAG_UNSHARE consistency checks
mm: add early FAULT_FLAG_WRITE consistency checks
mm: rework handling in do_wp_page() based on private vs. shared
mappings
mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for
private mappings
mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping
mm/gup: reliable R/O long-term pinning in COW mappings
RDMA/umem: remove FOLL_FORCE usage
RDMA/usnic: remove FOLL_FORCE usage
RDMA/siw: remove FOLL_FORCE usage
media: videobuf-dma-sg: remove FOLL_FORCE usage
drm/etnaviv: remove FOLL_FORCE usage
media: pci/ivtv: remove FOLL_FORCE usage
mm/frame-vector: remove FOLL_FORCE usage
drm/exynos: remove FOLL_FORCE usage
RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage
habanalabs: remove FOLL_FORCE usage
mm: rename FOLL_FORCE to FOLL_PTRACE

arch/alpha/kernel/ptrace.c | 6 +-
arch/arm64/kernel/mte.c | 2 +-
arch/ia64/kernel/ptrace.c | 10 +-
arch/mips/kernel/ptrace32.c | 4 +-
arch/mips/math-emu/dsemul.c | 2 +-
arch/powerpc/kernel/ptrace/ptrace32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 4 +-
arch/sparc/kernel/ptrace_64.c | 8 +-
arch/x86/kernel/step.c | 2 +-
arch/x86/um/ptrace_32.c | 2 +-
arch/x86/um/ptrace_64.c | 2 +-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +-
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +-
drivers/infiniband/core/umem.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +-
drivers/infiniband/sw/siw/siw_mem.c | 9 +-
drivers/media/common/videobuf2/frame_vector.c | 2 +-
drivers/media/pci/ivtv/ivtv-udma.c | 2 +-
drivers/media/pci/ivtv/ivtv-yuv.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +-
drivers/misc/habanalabs/common/memory.c | 3 +-
fs/exec.c | 2 +-
fs/proc/base.c | 2 +-
include/linux/mm.h | 35 +-
include/linux/mm_types.h | 8 +-
kernel/events/uprobes.c | 4 +-
kernel/ptrace.c | 12 +-
mm/gup.c | 38 +-
mm/huge_memory.c | 13 +-
mm/hugetlb.c | 14 +-
mm/memory.c | 97 +++--
mm/util.c | 4 +-
security/tomoyo/domain.c | 2 +-
tools/testing/selftests/vm/.gitignore | 2 +-
tools/testing/selftests/vm/Makefile | 10 +-
tools/testing/selftests/vm/check_config.sh | 4 +-
.../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++-
tools/testing/selftests/vm/run_vmtests.sh | 8 +-
39 files changed, 575 insertions(+), 177 deletions(-)
rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (75%)

--
2.38.1