Re: [PATCH] perf/record: make perf_event__synthesize_mmap_events() scale

From: Arnaldo Carvalho de Melo
Date: Wed Mar 15 2017 - 09:52:44 EST


Em Tue, Mar 14, 2017 at 11:57:21PM -0700, Stephane Eranian escreveu:
> This patch significantly improves the execution time of
> perf_event__synthesize_mmap_events() when running perf record
> on systems where processes have lots of threads. It just happens
> that cat /proc/pid/maps support uses a O(N^2) algorithm to generate
> each map line in the maps file. If you have 1000 threads, then you have
> necessarily 1000 stacks. For each vma, you need to check if it corresponds
> to a thread's stack. With a large number of threads, this can take a very long time. I have seen latencies >> 10mn.
>
> As of today, perf does not use the fact that a mapping is a stack,
> therefore we can work around the issue by using /proc/pid/tasks/pid/maps.
> This entry does not try to map a vma to stack and is thus much
> faster with no loss of functonality.
>
> The proc-map-timeout logic is kept in case user still want some uppre limit.
>
> Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
> ---
> tools/perf/util/event.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 4ea7ce7..b137566 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -255,8 +255,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> if (machine__is_default_guest(machine))
> return 0;
>
> - snprintf(filename, sizeof(filename), "%s/proc/%d/maps",
> - machine->root_dir, pid);
> + snprintf(filename, sizeof(filename), "%s/proc/%d/tasks/%d/maps",
> + machine->root_dir, pid, pid);

Humm...

[root@jouet ~]# ps -C firefox -L
PID LWP TTY TIME CMD
25023 25023 tty2 01:32:10 firefox
25023 25048 tty2 00:00:00 gmain
25023 25049 tty2 00:00:00 gdbus
25023 25056 tty2 00:00:00 Gecko_IOThread
25023 25058 tty2 00:00:00 Link Monitor
25023 25059 tty2 00:03:52 Socket Thread
25023 25060 tty2 00:00:00 JS Watchdog
25023 25061 tty2 00:00:12 JS Helper
25023 25062 tty2 00:00:12 JS Helper
25023 25063 tty2 00:00:13 JS Helper
25023 25064 tty2 00:00:12 JS Helper
25023 25065 tty2 00:00:13 JS Helper
25023 25066 tty2 00:00:12 JS Helper
25023 25067 tty2 00:00:13 JS Helper
25023 25068 tty2 00:00:13 JS Helper
25023 25069 tty2 00:00:00 Hang Monitor
25023 25070 tty2 00:00:00 BgHangManager
25023 25117 tty2 00:00:12 Cache2 I/O
25023 25118 tty2 00:05:37 Timer
25023 25146 tty2 00:00:00 DataStorage
25023 25150 tty2 00:00:00 GMPThread
25023 25152 tty2 00:04:12 Compositor
25023 25156 tty2 00:00:02 ImgDecoder #1
25023 25157 tty2 00:00:02 ImgDecoder #2
25023 25158 tty2 00:00:02 ImgDecoder #3
25023 25159 tty2 00:00:00 ImageIO
25023 25160 tty2 00:02:26 SoftwareVsyncTh
25023 25202 tty2 00:00:00 HTML5 Parser
25023 25204 tty2 00:00:00 IPDL Background
25023 25212 tty2 00:00:22 DOM Worker
25023 25219 tty2 00:00:23 URL Classifier
25023 25220 tty2 00:00:05 ImageBridgeChil
25023 25221 tty2 00:00:04 mozStorage #1
25023 25222 tty2 00:00:00 Proxy R~olution
25023 25223 tty2 00:00:00 DataStorage
25023 25234 tty2 00:00:00 Cache I/O
25023 25235 tty2 00:00:01 mozStorage #2
25023 25236 tty2 00:00:00 mozStorage #3
25023 25252 tty2 00:00:20 DOM Worker
25023 25256 tty2 00:00:00 mozStorage #4
25023 25257 tty2 00:00:00 localStorage DB
25023 25260 tty2 00:00:00 mozStorage #5
25023 2066 tty2 00:00:19 DOM Worker
25023 5988 tty2 00:00:00 threaded-ml
25023 6757 tty2 00:00:00 firefox
25023 6999 tty2 00:00:00 mozStorage #6
25023 19239 tty2 00:00:02 mozStorage #7
25023 12038 tty2 00:00:05 threaded-ml
25023 19301 tty2 00:00:00 firefox
[root@jouet ~]#

[root@jouet ~]# file /proc/25023/tasks
/proc/25023/tasks: cannot open `/proc/25023/tasks' (No such file or directory)
[root@jouet ~]#

But...

[root@jouet ~]# file /proc/25023/task
/proc/25023/task: directory
[root@jouet ~]# wc -l /proc/25023/task/25023/maps
1200 /proc/25023/task/25023/maps
[root@jouet ~]# wc -l /proc/25023/task/25048/maps
1201 /proc/25023/task/25048/maps
[root@jouet ~]# wc -l /proc/25048/task/25048/maps
1201 /proc/25048/task/25048/maps
[root@jouet ~]#

[acme@jouet linux]$ grep \"tasks\" fs/proc/*.c
[acme@jouet linux]$ grep \"task\" fs/proc/*.c
fs/proc/base.c: DIR("task", S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
fs/proc/base.c: name.name = "task";
[acme@jouet linux]$

These end up mapping to the later of these two:

const struct file_operations proc_pid_maps_operations = {
.open = pid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = proc_map_release,
};

const struct file_operations proc_tid_maps_operations = {
.open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
.release = proc_map_release,
};


Which ends up going down to:

show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)

with that is_pid respectively true and false, but then show_map_vma() doesn't
use 'is_pid' at all :-\

/me scratches head, what am I missing?

The following commit, that appeared circa v4.9-rc2. So, fixing up the "tasks"
-> "tasks" we end up with something safe and that avoids this by now
non-existing problem, on older kernels, ok?

commit b18cb64ead400c01bf1580eeba330ace51f8087d
Author: Andy Lutomirski <luto@xxxxxxxxxx>
Date: Fri Sep 30 10:58:57 2016 -0700

fs/proc: Stop trying to report thread stacks

This reverts more of:

b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps")

... which was partially reverted by:

65376df58217 ("proc: revert /proc/<pid>/maps [stack:TID] annotation")

Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps.

In current kernels, /proc/PID/maps (or /proc/TID/maps even for
threads) shows "[stack]" for VMAs in the mm's stack address range.

In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the
target thread's stack's VMA. This is racy, probably returns garbage
and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone:
KSTK_ESP is not safe to use on tasks that aren't known to be running
ordinary process-context kernel code.

This patch removes the difference and just shows "[stack]" for VMAs
in the mm's stack range. This is IMO much more sensible -- the
actual "stack" address really is treated specially by the VM code,
and the current thread stack isn't even well-defined for programs
that frequently switch stacks on their own.

Reported-by: Jann Horn <jann@xxxxxxxxx>
Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
Acked-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>


>
> fp = fopen(filename, "r");
> if (fp == NULL) {
> --
> 2.5.0