Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record profiling on large server systems

From: Jiri Olsa
Date: Mon Jan 28 2019 - 06:27:54 EST


On Tue, Jan 22, 2019 at 08:45:12PM +0300, Alexey Budankov wrote:

SNIP

> The patch set has been validated on BT benchmark from NAS Parallel
> Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell
> system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5
> (tip perf/core).
>
> The patch set is for Arnaldo's perf/core repository.
>
> OVERHEAD:
> BENCH REPORT BASED ELAPSED TIME BASED
> v4.20.0-rc5
> (tip perf/core):
>
> (current) SERIAL-SYS / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
> SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
> SERIAL-CPU / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
>
> AIO1-SYS / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
> AIO1-NODE / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
> AIO1-CPU / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
>
> v4.4.0-21-generic
> (Ubuntu 16.04 LTS):
>
> (current) SERIAL-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
> SERIAL-CPU / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
>
> AIO1-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> AIO1-NODE / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
> AIO1-CPU / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
>
> ---
> Alexey Budankov (4):
> perf record: allocate affinity masks
> perf record: bind the AIO user space buffers to nodes
> perf record: apply affinity masks when reading mmap buffers
> perf record: implement --affinity=node|cpu option
>
> tools/perf/Documentation/perf-record.txt | 5 ++
> tools/perf/builtin-record.c | 45 +++++++++-
> tools/perf/perf.h | 8 ++
> tools/perf/util/cpumap.c | 10 +++
> tools/perf/util/cpumap.h | 1 +
> tools/perf/util/evlist.c | 6 +-
> tools/perf/util/evlist.h | 2 +-
> tools/perf/util/mmap.c | 105 ++++++++++++++++++++++-
> tools/perf/util/mmap.h | 3 +-
> 9 files changed, 175 insertions(+), 10 deletions(-)
>
> ---
> Changes in v5:
> - avoided multiple allocations of online cpu maps by
> implementing it once in cpu_map__online()
> - reduced indentation at record__parse_affinity()

Reviewed-by: Jiri Olsa <jolsa@xxxxxxxxxx>

thanks,
jirka