Re: [PATCH 1/1] Revert "perf report: Append inlines to non-DWARF callchains"

From: Jesper Dangaard Brouer
Date: Tue Aug 08 2023 - 13:13:14 EST




On 07/08/2023 16.03, Artem Savkov wrote:
On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
Hi Arnaldo,

On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
Hi Artem,

Can you please double check this? I reproduced with:

git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
build it
perf record -a -g sleep 5s
perf report

Do you get the same slowness and then reverting it, i.e. just
going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
without the inlines in the callchains?

With a simple test like this I definitely get a slowdown, but not sure
if it can be called excessive.

Below are the times I got by running 'time perf report' and hitting 'q'
during load so that it quits as soon as it is loads up. Tested on a
freshly updated fedora 38.


I reported this problem to ACME. It is also possible to reproduce
without hitting 'q' via using this cmdline with --stdio like this:

time perf report -v --stdio > /dev/null 2> debug01.stderr

The file 'debug01.stderr' contained a lot of addr2line output, that
might help debug the issue further.



My bad, I had wrong debuginfo installed for the kernel I tested. I can
reproduce it with the correct one. Looks like vmlinux is just too much
for addr2line. Maybe we can skip it but leave other inlines in, like so:

That is a possibilit, and probably we could make it cheaper by looking
at the cpumode, avoiding calling addr2line when we didn't makage to
resolve the symbol, etc.

We also may want to have this as an option that has to be explicitely
enabled, like --resolve-inlines, as this will add overhead no matter if
we stop calling addr2line and do it more efficiently, etc.

Sounds good, I'll look into it.

Fact is, we're late in the 6.5 schedule, so the best thing now is to
just revert the patch and then try again later, ok?

Yes, sure.

- Arnaldo
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 11de3ca8d4fa7..fef309cd401f7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
ms.map = map__get(al.map);
ms.sym = al.sym;
- if (!branch && append_inlines(cursor, &ms, ip) == 0)
+ if (!branch && ms.map && ms.map->dso &&
+ strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
+ append_inlines(cursor, &ms, ip) == 0)
goto out;
srcline = callchain_srcline(&ms, al.addr);

- Arnaldo

----

This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.

The tests were made with a specific workload, further tests on a
recently updated fedora 38 system with a system wide perf.data file
shows 'perf report' taking excessive time, so lets revert this until a
full investigation and improvement on the addr2line support code is
made.


Reported-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>

Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
Cc: Artem Savkov <asavkov@xxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Ian Rogers <irogers@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Milian Wolff <milian.wolff@xxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
---
tools/perf/util/machine.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -45,7 +45,6 @@
static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
struct thread *th, bool lock);
-static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
static struct dso *machine__kernel_dso(struct machine *machine)
{
@@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
ms.maps = maps__get(al.maps);
ms.map = map__get(al.map);
ms.sym = al.sym;
-
- if (!branch && append_inlines(cursor, &ms, ip) == 0)
- goto out;
-
srcline = callchain_srcline(&ms, al.addr);
err = callchain_cursor_append(cursor, ip, &ms,
branch, flags, nr_loop_iter,

Tested-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>

--Jesper