Re: [PATCHSET 0/9] perf report: Improve srcline sort performance (v1)

From: Ian Rogers
Date: Thu Dec 15 2022 - 15:28:29 EST


On Thu, Dec 15, 2022 at 11:28 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> Hello,
>
> I noticed a performance problem in the srcline/srcfile processing during
> perf report when it's using an external addr2line process. I guess it's
> also helpful even if it uses the libbfd to get the srcline info.
>
> Also note that it's mostly from large (static) binaries, but smaller
> binaries should also benefit from the fix if they have a lot of samples.
>
> The first 5 patches are general fixes and updates. The latter 4 patches
> implemented the actual speed-up.
>
> Let's test it with the perf tools itself. Build a static binary like below.
>
> $ cd tools/perf
> $ make NO_JVMTI=1 LDFLAGS=-static
>
> Then run the perf test workload.
>
> $ ./perf record -- ./perf test -w noploop
>
> And run the perf report with srcline sort key like this.
>
> $ ./perf report -n -s srcline --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 4K of event 'cycles:u'
> # Event count (approx.): 3572938596
> #
> # Overhead Samples Source:Line
> # ........ ............ ............
> #
> 99.94% 4010 noploop.c:26
> 0.03% 14 ??:0
> 0.03% 1 perf.c:330
> 0.00% 1 wcscpy.o:0
>
> The problem is that it runs the addr2line when it processes each sample.
> But as you can see many samples can have same result. IOW, if the samples
> have same address, we don't need to run the addr2line each time.
>
> So I changed the sort_key->cmp() to compare the addresses only and moved
> the addr2line from sort_key->collapse() so that they can be run after
> merging the samples with the same address.
>
> With the change, I can get a huge speed-up in processing srcline info
> while they generate the same output.
>
> Before:
>
> $ ./perf stat -- ./perf report -s srcline > /dev/null
>
> Performance counter stats for './perf report -s srcline':
>
> 15,397.13 msec task-clock:u # 0.993 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 3,810 page-faults:u # 247.449 /sec
> 54,516,351,820 cycles:u # 3.541 GHz
> 31,494,118,293 instructions:u # 0.58 insn per cycle
> 8,577,271,187 branches:u # 557.069 M/sec
> 1,216,165,520 branch-misses:u # 14.18% of all branches
>
> 15.505066606 seconds time elapsed
>
> 15.094122000 seconds user
> 0.396962000 seconds sys
>
> After:
>
> $ ./perf stat -- ./perf report -s srcline > /dev/null
>
> Performance counter stats for './perf report -s srcline':
>
> 105.66 msec task-clock:u # 0.994 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 3,275 page-faults:u # 30.995 K/sec
> 185,063,407 cycles:u # 1.751 GHz
> 142,470,215 instructions:u # 0.77 insn per cycle
> 34,584,038 branches:u # 327.311 M/sec
> 3,226,005 branch-misses:u # 9.33% of all branches
>
> 0.106270464 seconds time elapsed
>
> 0.074254000 seconds user
> 0.032871000 seconds sys
>
> The code is available at 'perf/srcline-v1' branch in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (9):
> perf srcline: Do not return NULL for srcline
> perf report: Ignore SIGPIPE for srcline
> perf symbol: Add filename__has_section()
> perf srcline: Skip srcline if .debug_line is missing
> perf srcline: Conditionally suppress addr2line warnings
> perf hist: Add perf_hpp_fmt->init() callback
> perf hist: Improve srcline sort key performance
> perf hist: Improve srcfile sort key performance
> perf hist: Improve srcline_{from,to} sort key performance

Nice 145x performance win!

Series: Acked-by: Ian Rogers <irogers@xxxxxxxxxx>

Thanks,
Ian

> tools/perf/builtin-report.c | 1 +
> tools/perf/util/hist.c | 10 +--
> tools/perf/util/hist.h | 1 +
> tools/perf/util/sort.c | 129 ++++++++++++++++++++++++++++---
> tools/perf/util/sort.h | 1 +
> tools/perf/util/srcline.c | 20 +++--
> tools/perf/util/symbol-elf.c | 28 +++++++
> tools/perf/util/symbol-minimal.c | 5 ++
> tools/perf/util/symbol.h | 1 +
> 9 files changed, 176 insertions(+), 20 deletions(-)
>
>
> base-commit: 818448e9cf92e5c6b3c10320372eefcbe4174e4f
> --
> 2.39.0.314.g84b9a713c41-goog
>