Re: [PATCH] mm, gup: introduce concept of "foreign" get_user_pages()

From: Vlastimil Babka
Date: Mon Jan 18 2016 - 10:20:31 EST


On 01/15/2016 07:11 PM, Dave Hansen wrote:
Just sending an update to this one patch instead of resending
the entire series.

Jan Kara suggested that we just change get_user_pages()'s
prototype to remove tsk/mm instead of introducing a separate
get_user_pages_current(). Also, we moved the "_foreign" in
get_user_pages_foreign() to the end to be more consistent with
the "_unlocked" version.

Thanks, and you could have blamed me too, not just Jan ;)

This approach will break any new users of get_user_pages()
which try to pass a tsk/mm, but Jan doesn't think these are
frequent enough to be a concern. This passes an allyesconfig
on 4.4, at least.

As always, any acks on this approach would be much appreciated.
This is the largest swath of non-x86 code that protection keys
touches, and I'm sure the x86 maintainers would appreciate
seeing some acks from folks on it.

This is finally a thorough review attempt, sorry I didn't catch some of the stuff below earlier.

---

From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>

For protection keys, we need to understand whether protections
should be enforced in software or not. In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "foreign" operations.

This patch introduces a new get_user_pages() variant:

get_user_pages_foreign()

The plain get_user_pages() can no longer be used on mm/tasks
other than 'current/current->mm', which is by far the most common
way it is called. Using it makes a few of the call sites look a
bit nicer.

get_user_pages_foreign() is a replacement for when
get_user_pages() is called on non-current tsk/mm.

This also switches get_user_pages_unlocked() over to be like
get_user_pages() and not take a tsk/mm. If someone wants the
get_user_pages_unlocked() behavior with a non-current tsk/mm,
they just have to use __get_user_pages_unlocked() directly.

Hmm, but your patch actually changes __get_user_pages_unlocked() to also not include the task and mm params and assume current and current->mm? Wouldn't it be more consistent if __get_user_unlocked() stayed as it is?
What you say above is true for {__}get_user_pages_locked.

Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: vbabka@xxxxxxx
Cc: jack@xxxxxxx
---

b/arch/cris/arch-v32/drivers/cryptocop.c | 8 ---
b/arch/ia64/kernel/err_inject.c | 3 -
b/arch/mips/mm/gup.c | 3 -
b/arch/s390/mm/gup.c | 4 -
b/arch/sh/mm/gup.c | 2
b/arch/sparc/mm/gup.c | 2
b/arch/x86/mm/gup.c | 2
b/arch/x86/mm/mpx.c | 4 -
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 -
b/drivers/gpu/drm/i915/i915_gem_userptr.c | 2
b/drivers/gpu/drm/radeon/radeon_ttm.c | 3 -
b/drivers/gpu/drm/via/via_dmablit.c | 3 -
b/drivers/infiniband/core/umem.c | 2
b/drivers/infiniband/core/umem_odp.c | 8 +--
b/drivers/infiniband/hw/mthca/mthca_memfree.c | 3 -
b/drivers/infiniband/hw/qib/qib_user_pages.c | 3 -
b/drivers/infiniband/hw/usnic/usnic_uiom.c | 2
b/drivers/media/pci/ivtv/ivtv-udma.c | 4 -
b/drivers/media/pci/ivtv/ivtv-yuv.c | 10 +---
b/drivers/media/v4l2-core/videobuf-dma-sg.c | 3 -
b/drivers/misc/mic/scif/scif_rma.c | 2
b/drivers/misc/sgi-gru/grufault.c | 3 -
b/drivers/scsi/st.c | 2
b/drivers/staging/rdma/hfi1/user_pages.c | 3 -
b/drivers/staging/rdma/ipath/ipath_user_pages.c | 3 -
b/drivers/video/fbdev/pvr2fb.c | 4 -
b/drivers/virt/fsl_hypervisor.c | 5 --
b/fs/exec.c | 8 ++-
b/include/linux/mm.h | 23 +++++-----
b/kernel/events/uprobes.c | 4 -
b/mm/frame_vector.c | 2
b/mm/gup.c | 51 +++++++++++++++---------
b/mm/ksm.c | 2
b/mm/memory.c | 2
b/mm/mempolicy.c | 6 +-
b/mm/nommu.c | 35 +++++++++-------
b/mm/process_vm_access.c | 6 +-
b/mm/util.c | 4 -
b/net/ceph/pagevec.c | 2
b/security/tomoyo/domain.c | 9 +++-
b/virt/kvm/async_pf.c | 2
b/virt/kvm/kvm_main.c | 13 ++----
42 files changed, 135 insertions(+), 130 deletions(-)

[...]

--- a/kernel/events/uprobes.c~get_current_user_pages 2016-01-15 09:45:42.110046066 -0800
+++ b/kernel/events/uprobes.c 2016-01-15 09:45:42.152047953 -0800
@@ -298,7 +298,7 @@ int uprobe_write_opcode(struct mm_struct

retry:
/* Read the page with vaddr into memory */
- ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma);
+ ret = get_user_pages_foreign(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma);
if (ret <= 0)
return ret;

@@ -1699,7 +1699,7 @@ static int is_trap_at_addr(struct mm_str
if (likely(result == 0))
goto out;

- result = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
+ result = get_user_pages(vaddr, 1, 0, 1, &page, NULL);

Yeah it seems that mm here is current->mm, and using current task instead of NULL affects AFAICS just the min/maj fault counting, but isn't it still a subtle and unintended functional change?

[...]

-__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
- unsigned long start, unsigned long nr_pages,
+__always_inline long __get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
int write, int force, struct page **pages,
unsigned int gup_flags)

This is the IMHO unneeded inconsistency what I mentioned above...

diff -puN mm/process_vm_access.c~get_current_user_pages mm/process_vm_access.c
--- a/mm/process_vm_access.c~get_current_user_pages 2016-01-15 09:45:42.120046515 -0800
+++ b/mm/process_vm_access.c 2016-01-15 09:45:42.157048177 -0800
@@ -99,8 +99,10 @@ static int process_vm_rw_single_vec(unsi
size_t bytes;

/* Get the pages we're interested in */
- pages = get_user_pages_unlocked(task, mm, pa, pages,
- vm_write, 0, process_pages);
+ down_read(&mm->mmap_sem);
+ pages = get_user_pages_foreign(task, mm, pa, pages, vm_write,
+ 0, process_pages, NULL);
+ up_read(&mm->mmap_sem);

You could have simply used __get_user_pages_unlocked() if it wasn't changed.

@@ -80,7 +80,7 @@ static void async_pf_execute(struct work

might_sleep();

- get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL);
+ get_user_pages_unlocked(addr, 1, 1, 0, NULL);

This seems to get mm from some structure where struct work is embedded, are you sure it's current->mm? I think it's another place for __get_user_pages_unlocked().

static inline int check_user_page_hwpoison(unsigned long addr)
@@ -1344,12 +1345,10 @@ static int hva_to_pfn_slow(unsigned long

if (async) {
down_read(&current->mm->mmap_sem);
- npages = get_user_page_nowait(current, current->mm,
- addr, write_fault, page);
+ npages = get_user_page_nowait(addr, write_fault, page);
up_read(&current->mm->mmap_sem);
} else
- npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
- write_fault, 0, page,
+ npages = __get_user_pages_unlocked(addr, 1, write_fault, 0, page,
FOLL_TOUCH|FOLL_HWPOISON);
if (npages != 1)
return npages;

If you change __get_user_pages_unlocked() as I suggested, you could use get_user_pages_unlocked() here?