Re: [GIT PULL] x86/mm for 6.4

From: Linus Torvalds
Date: Fri Apr 28 2023 - 16:08:06 EST


On Thu, Apr 27, 2023 at 3:57 PM Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
> Please pull some x86/mm changes for 6.4. The only content here is
> solely a new revision of Kirill's Linear Address Masking implementation.

So I was waiting for this for my final piece of the x86 user copy
changes, so here goes...

I think we should now make 'access_ok()' do the same thing that
get/put_user() do with the LAM code: only worry about the sign bit.

So here's my suggested change on top of the current tree. Comments?

PeterZ also added to the cc, because he's the source of that
WARN_ON_IN_IRQ() in the x86 'access_ok()' macro. That's the only
reason x86 has its own copy of that.

I wonder if that WARN_ON_IN_IRQ() should just be removed, or perhaps
moved into the generic code in <asm-generic/access_ok.h>?

Linus
From 2c283a649b07d180d5a7bcfaf539365297e82c23 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 28 Apr 2023 12:55:10 -0700
Subject: [PATCH] x86-64: make access_ok() independent of LAM

The linear address masking (LAM) code made access_ok() more complicated,
in that it now needs to untag the address in order to verify the access
range. See commit 74c228d20a51 ("x86/uaccess: Provide untagged_addr()
and remove tags before address check").

We were able to avoid that overhead in the get_user/put_user code paths
by simply using the sign bit for the address check, and depending on the
GP fault if the address was non-canonical, which made it all independent
of LAM.

And we can do the same thing for access_ok(): simply check that the user
pointer range has the high bit clear. No need to bother with any
address bit masking.

In fact, we can go a bit further, and just check the starting address
for known small accesses ranges: any accesses that overflow will still
be in the non-canonical area and will still GP fault.

To still make syzkaller catch any potentially unchecked user addresses,
we'll continue to warn about GP faults that are caused by accesses in
the non-canonical range. But we'll limit that to purely "high bit set
and past the one-page 'slop' area".

We could probably just do that "check only starting address" for any
arbitrary range size: realistically all kernel accesses to user space
will be done starting at the low address. But let's leave that kind of
optimization for later. As it is, this already allows us to generate
simpler code and not worry about any tag bits in the address.

The one thing to look out for is the GUP address check: instead of
actually copying data in the virtual address range (and thus bad
addresses being caught by the GP fault), GUP will look up the page
tables manually. As a result, the page table limits need to be checked,
and that was previously implicitly done by the access_ok().

With the relaxed access_ok() check, we need to just do an explicit check
for TASK_SIZE_MAX in the GUP code instead. The GUP code already needs
to do the tag bit unmasking anyway, so there this is all very
straightforward, and there are no LAM issues.

Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
arch/x86/include/asm/uaccess.h | 39 +++++++++++++++++++++++++++++----
arch/x86/mm/extable.c | 40 +++++++++++++++++++++++++++++-----
mm/gup.c | 2 ++
3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 457e814712af..123135d60f72 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -75,6 +75,34 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm,
#define untagged_addr(addr) (addr)
#endif

+#ifdef CONFIG_X86_64
+/*
+ * On x86-64, we may have tag bits in the user pointer. Rather than
+ * mask them off, just change the rules for __access_ok().
+ *
+ * Make the rule be that 'ptr+size' must not overflow, and must not
+ * have the high bit set. Compilers generally understand about
+ * unsigned overflow and the CF bit and generate reasonable code for
+ * this. Although it looks like the combination confuses at least
+ * clang (and instead of just doing an "add" followed by a test of
+ * SF and CF, you'll see that unnecessary comparison).
+ *
+ * For the common case of small sizes that can be checked at compile
+ * time, don't even bother with the addition, and just check that the
+ * base pointer is ok.
+ */
+static inline bool __access_ok(const void __user *ptr, unsigned long size)
+{
+ if (__builtin_constant_p(size <= PAGE_SIZE) && size <= PAGE_SIZE) {
+ return (long)ptr >= 0;
+ } else {
+ unsigned long sum = size + (unsigned long)ptr;
+ return (long) sum >= 0 && sum >= (unsigned long)ptr;
+ }
+}
+#define __access_ok __access_ok
+#endif
+
/**
* access_ok - Checks if a user space pointer is valid
* @addr: User space pointer to start of block to check
@@ -91,11 +119,14 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm,
*
* Return: true (nonzero) if the memory block may be valid, false (zero)
* if it is definitely invalid.
+ *
+ * This should not be x86-specific. The only odd things out here is
+ * the WARN_ON_IN_IRQ(), which doesn't exist in the generic version.
*/
-#define access_ok(addr, size) \
-({ \
- WARN_ON_IN_IRQ(); \
- likely(__access_ok(untagged_addr(addr), size)); \
+#define access_ok(addr, size) \
+({ \
+ WARN_ON_IN_IRQ(); \
+ likely(__access_ok(addr, size)); \
})

#include <asm-generic/access_ok.h>
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 60814e110a54..8d38dedadbb1 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -130,10 +130,36 @@ static bool ex_handler_fprestore(const struct exception_table_entry *fixup,
return true;
}

+/*
+ * On x86-64, we end up being imprecise with 'access_ok()', and allow
+ * non-canonical user addresses to make the range comparisons simpler,
+ * and to not have to worry about LAM being enabled.
+ *
+ * In fact, we allow up to one page of "slop" at the sign boundary,
+ * which means that we can do access_ok() by just checking the sign
+ * of the pointer for the common case of having a small access size.
+ */
+static bool gp_fault_address_ok(unsigned long fault_address)
+{
+#ifdef CONFIG_X86_64
+ /* Is it in the "user space" part of the non-canonical space? */
+ if ((long) fault_address >= 0)
+ return true;
+
+ /* .. or just above it? */
+ fault_address -= PAGE_SIZE;
+ if ((long) fault_address >= 0)
+ return true;
+#endif
+ return false;
+}
+
static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
- struct pt_regs *regs, int trapnr)
+ struct pt_regs *regs, int trapnr,
+ unsigned long fault_address)
{
- WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
+ WARN_ONCE(trapnr == X86_TRAP_GP && !gp_fault_address_ok(fault_address),
+ "General protection fault in user access. Non-canonical address?");
return ex_handler_default(fixup, regs);
}

@@ -189,10 +215,12 @@ static bool ex_handler_imm_reg(const struct exception_table_entry *fixup,
}

static bool ex_handler_ucopy_len(const struct exception_table_entry *fixup,
- struct pt_regs *regs, int trapnr, int reg, int imm)
+ struct pt_regs *regs, int trapnr,
+ unsigned long fault_address,
+ int reg, int imm)
{
regs->cx = imm * regs->cx + *pt_regs_nr(regs, reg);
- return ex_handler_uaccess(fixup, regs, trapnr);
+ return ex_handler_uaccess(fixup, regs, trapnr, fault_address);
}

int ex_get_fixup_type(unsigned long ip)
@@ -238,7 +266,7 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
case EX_TYPE_FAULT_MCE_SAFE:
return ex_handler_fault(e, regs, trapnr);
case EX_TYPE_UACCESS:
- return ex_handler_uaccess(e, regs, trapnr);
+ return ex_handler_uaccess(e, regs, trapnr, fault_addr);
case EX_TYPE_COPY:
return ex_handler_copy(e, regs, trapnr);
case EX_TYPE_CLEAR_FS:
@@ -269,7 +297,7 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
case EX_TYPE_FAULT_SGX:
return ex_handler_sgx(e, regs, trapnr);
case EX_TYPE_UCOPY_LEN:
- return ex_handler_ucopy_len(e, regs, trapnr, reg, imm);
+ return ex_handler_ucopy_len(e, regs, trapnr, fault_addr, reg, imm);
case EX_TYPE_ZEROPAD:
return ex_handler_zeropad(e, regs, fault_addr);
}
diff --git a/mm/gup.c b/mm/gup.c
index ff689c88a357..bbe416236593 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2970,6 +2970,8 @@ static int internal_get_user_pages_fast(unsigned long start,
len = nr_pages << PAGE_SHIFT;
if (check_add_overflow(start, len, &end))
return 0;
+ if (end > TASK_SIZE_MAX)
+ return -EFAULT;
if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;

--
2.39.1.437.g88ae5c7f81