[tip:perf/core] perf/x86: Optimize stack walk user accesses

From: tip-bot for Andi Kleen
Date: Mon Nov 23 2015 - 11:25:44 EST


Commit-ID: 75925e1ad7f5a4e867bd14ff8e7f114ea1596434
Gitweb: http://git.kernel.org/tip/75925e1ad7f5a4e867bd14ff8e7f114ea1596434
Author: Andi Kleen <ak@xxxxxxxxxxxxxxx>
AuthorDate: Thu, 22 Oct 2015 15:07:21 -0700
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Mon, 23 Nov 2015 09:58:25 +0100

perf/x86: Optimize stack walk user accesses

Change the perf user stack walking to use the new
__copy_from_user_nmi(), and split each access into word sized transfer
sizes. This allows to inline the complete access and optimize it all
into a single load.

The main advantage is that this avoids the overhead of double page
faults. When normal copy_from_user() fails it reexecutes the copy to
compute an accurate number of non copied bytes. This leads to
executing the expensive page fault twice.

While walking stacks having a fault at some point is relatively common
(typically when some part of the program isn't compiled with frame
pointers), so this is a large overhead.

With the optimized copies we avoid this problem because they only do
all accesses once. And of course they're much faster too when the
access does not fault because they're just single instructions instead
of complex function calls.

While profiling a kernel build with -g, the patch brings down the
average time of the PMI handler from 966ns to 552ns (-43%).

Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Vince Weaver <vincent.weaver@xxxxxxxxx>
Link: http://lkml.kernel.org/r/1445551641-13379-2-git-send-email-andi@xxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/kernel/cpu/perf_event.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2bf79d7..9dfbba5 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2250,12 +2250,19 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
ss_base = get_segment_base(regs->ss);

fp = compat_ptr(ss_base + regs->bp);
+ pagefault_disable();
while (entry->nr < PERF_MAX_STACK_DEPTH) {
unsigned long bytes;
frame.next_frame = 0;
frame.return_address = 0;

- bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+ if (!access_ok(VERIFY_READ, fp, 8))
+ break;
+
+ bytes = __copy_from_user_nmi(&frame.next_frame, fp, 4);
+ if (bytes != 0)
+ break;
+ bytes = __copy_from_user_nmi(&frame.return_address, fp+4, 4);
if (bytes != 0)
break;

@@ -2265,6 +2272,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
perf_callchain_store(entry, cs_base + frame.return_address);
fp = compat_ptr(ss_base + frame.next_frame);
}
+ pagefault_enable();
return 1;
}
#else
@@ -2302,12 +2310,19 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
if (perf_callchain_user32(regs, entry))
return;

+ pagefault_disable();
while (entry->nr < PERF_MAX_STACK_DEPTH) {
unsigned long bytes;
frame.next_frame = NULL;
frame.return_address = 0;

- bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+ if (!access_ok(VERIFY_READ, fp, 16))
+ break;
+
+ bytes = __copy_from_user_nmi(&frame.next_frame, fp, 8);
+ if (bytes != 0)
+ break;
+ bytes = __copy_from_user_nmi(&frame.return_address, fp+8, 8);
if (bytes != 0)
break;

@@ -2315,8 +2330,9 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
break;

perf_callchain_store(entry, frame.return_address);
- fp = frame.next_frame;
+ fp = (void __user *)frame.next_frame;
}
+ pagefault_enable();
}

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/