Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

From: Peter Zijlstra
Date: Mon Feb 25 2019 - 03:33:22 EST


On Sun, Feb 24, 2019 at 09:26:45AM -0800, Linus Torvalds wrote:
> PeterZ, do you remember the particular use case that triggered that
> commit 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok()
> context")?

This one, if I'm not mistaken.

---

commit ae31fe51a3cceaa0cabdb3058f69669ecb47f12e
Author: Johannes Weiner <hannes@xxxxxxxxxxx>
Date: Tue Nov 22 10:57:42 2016 +0100

perf/x86: Restore TASK_SIZE check on frame pointer

The following commit:

75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")

... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
access_ok() check.

Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
whereas the access_ok() uses whatever the current address limit of the task is.

We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
then see vmalloc faults when we access what looks like pointers into vmalloc
space:

[] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
[] CPU: 3 PID: 3685731 Comm: sh Tainted: G W 4.6.0-5_fbk1_223_gdbf0f40 #1
[] Call Trace:
[] <NMI> [<ffffffff814717d1>] dump_stack+0x4d/0x6c
[] [<ffffffff81076e43>] __warn+0xd3/0xf0
[] [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20
[] [<ffffffff8104a899>] vmalloc_fault+0x289/0x290
[] [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490
[] [<ffffffff8104b70c>] do_page_fault+0xc/0x10
[] [<ffffffff81794e82>] page_fault+0x22/0x30
[] [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0
[] [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190
[] [<ffffffff811512c7>] perf_callchain+0x67/0x80
[] [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370
[] [<ffffffff8114e840>] perf_event_output+0x20/0x60
[] [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130
[] [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0
[] [<ffffffff8114f484>] perf_event_overflow+0x14/0x20
[] [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0
[] [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20
[] [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0
[] [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20
[] [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50
[] [<ffffffff8101ea31>] nmi_handle+0x61/0x110
[] [<ffffffff8101ef94>] default_do_nmi+0x44/0x110
[] [<ffffffff8101f13b>] do_nmi+0xdb/0x150
[] [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
[] <<EOE>> <IRQ> [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0

Fix this by moving the valid_user_frame() check to before the uaccess
that loads the return address and the pointer to the next frame.

Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Vince Weaver <vincent.weaver@xxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Fixes: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d31735f37ed7..9d4bf3ab049e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2352,7 +2352,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
frame.next_frame = 0;
frame.return_address = 0;

- if (!access_ok(VERIFY_READ, fp, 8))
+ if (!valid_user_frame(fp, sizeof(frame)))
break;

bytes = __copy_from_user_nmi(&frame.next_frame, fp, 4);
@@ -2362,9 +2362,6 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
if (bytes != 0)
break;

- if (!valid_user_frame(fp, sizeof(frame)))
- break;
-
perf_callchain_store(entry, cs_base + frame.return_address);
fp = compat_ptr(ss_base + frame.next_frame);
}
@@ -2413,7 +2410,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
frame.next_frame = NULL;
frame.return_address = 0;

- if (!access_ok(VERIFY_READ, fp, sizeof(*fp) * 2))
+ if (!valid_user_frame(fp, sizeof(frame)))
break;

bytes = __copy_from_user_nmi(&frame.next_frame, fp, sizeof(*fp));
@@ -2423,9 +2420,6 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
if (bytes != 0)
break;

- if (!valid_user_frame(fp, sizeof(frame)))
- break;
-
perf_callchain_store(entry, frame.return_address);
fp = (void __user *)frame.next_frame;
}