Re: 'perf stat --null sleep 1' segfault related to evlist__for_each_entry with affinity == NULL

From: Ian Rogers
Date: Wed Mar 16 2022 - 03:14:34 EST


On Tue, Mar 15, 2022 at 5:50 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> This bounced yesterday, resending :-\
>
> -----
>
> Hi Ian,
>
> I just noticed this:
>
> (gdb) run stat --null sleep 1
> `/var/home/acme/bin/perf' has changed; re-reading symbols.
> Starting program: /var/home/acme/bin/perf stat --null sleep 1
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [Detaching after fork from child process 900002]
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000596790 in create_perf_stat_counter (evsel=0xda9580, config=0xcbeac0 <stat_config>, target=0xcbe8a0 <target>, cpu_map_idx=0) at util/stat.c:548
> 548 if (leader->core.nr_members > 1)
> Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-6.fc34.x86_64 cyrus-sasl-lib-2.1.27-8.fc34.x86_64 elfutils-debuginfod-client-0.186-1.fc34.x86_64 elfutils-libelf-0.186-1.fc34.x86_64 elfutils-libs-0.186-1.fc34.x86_64 glib2-2.68.4-1.fc34.x86_64 glibc-2.33-21.fc34.x86_64 keyutils-libs-1.6.1-2.fc34.x86_64 libbabeltrace-1.5.8-6.fc34.x86_64 libbrotli-1.0.9-4.fc34.x86_64 libcap-2.48-2.fc34.x86_64 libcom_err-1.45.6-5.fc34.x86_64 libcurl-7.76.1-12.fc34.x86_64 libgcc-11.2.1-9.fc34.x86_64 libidn2-2.3.2-1.fc34.x86_64 libnghttp2-1.43.0-2.fc34.x86_64 libpsl-0.21.1-3.fc34.x86_64 libssh-0.9.6-1.fc34.x86_64 libstdc++-11.2.1-9.fc34.x86_64 libunistring-0.9.10-10.fc34.x86_64 libunwind-1.4.0-5.fc34.x86_64 libuuid-2.36.2-1.fc34.x86_64 libxcrypt-4.4.28-1.fc34.x86_64 libzstd-1.5.2-1.fc34.x86_64 numactl-libs-2.0.14-3.fc34.x86_64 opencsd-1.1.1-1.fc34.x86_64 openldap-2.4.57-6.fc34.x86_64 openssl-libs-1.1.1l-2.fc34.x86_64 pcre-8.44-3.fc34.1.x86_64 perl-libs-5.32.1-477.fc34.x86_64 popt-1.18-4.fc34.x86_64 python3-libs-3.9.10-1.fc34.x86_64 slang-2.3.2-9.fc34.x86_64 xz-libs-5.2.5-5.fc34.x86_64
> (gdb) bt
> #0 0x0000000000596790 in create_perf_stat_counter (evsel=0xda9580, config=0xcbeac0 <stat_config>, target=0xcbe8a0 <target>, cpu_map_idx=0) at util/stat.c:548
> #1 0x0000000000438c2c in __run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:835
> #2 0x00000000004395d5 in run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:1047
> #3 0x000000000043ce17 in cmd_stat (argc=2, argv=0x7fffffffe1f0) at builtin-stat.c:2561
> #4 0x00000000004dc2d1 in run_builtin (p=0xcd5760 <commands+288>, argc=4, argv=0x7fffffffe1f0) at perf.c:313
> #5 0x00000000004dc538 in handle_internal_command (argc=4, argv=0x7fffffffe1f0) at perf.c:365
> #6 0x00000000004dc684 in run_argv (argcp=0x7fffffffe03c, argv=0x7fffffffe030) at perf.c:409
> #7 0x00000000004dca50 in main (argc=4, argv=0x7fffffffe1f0) at perf.c:539
> (gdb) p leader
> $1 = (struct evsel *) 0x0
> (gdb) fr 1
> #1 0x0000000000438c2c in __run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:835
> 835 if (create_perf_stat_counter(counter, &stat_config, &target,
> (gdb) p evsel_list
> $2 = (struct evlist *) 0xda9580
> (gdb) p evsel_list->core.nr_entries
> $3 = 0
> (gdb)
>
> I've bisected it to a patch I added that avoids setting up affinity when
> there is no need to do so, like when running 'perf stat sleep 1', this
> one:
>
> commit 49de179577e7b05b57f625bf05cdc60a72de38d0
> Author: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Date: Mon Jan 17 13:09:29 2022 -0300
>
> perf stat: No need to setup affinities when starting a workload
>
> I.e. the simple:
>
> $ perf stat sleep 1
>
> Uses a dummy CPU map and thus there is no need to setup/cleanup
> affinities to avoid IPIs, etc.
>
> With this we're down to a sched_getaffinity() call, in the libnuma
> initialization, that probably can be removed in a followup patch.
>
> Acked-by: Ian Rogers <irogers@xxxxxxxxxx>
>
> The problem is that when using --null we end up with an empty evlist
> and:
>
> evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity)
>
> in __run_perf_stat()
>
> gets called with affinity == NULL.
>
> Looking at its code it checks if affinity is NULL, etc, so I thought it
> would be ok, but it ends up passing, IIRC, the empty list head:
>
> (gdb) p evsel->core.node
> $4 = {next = 0xda9580, prev = 0xda9580}
> (gdb) p evsel
> $5 = (struct evsel *) 0xda9580
> (gdb) fr 1
> #1 0x0000000000438c2c in __run_perf_stat (argc=2, argv=0x7fffffffe1f0, run_idx=0) at builtin-stat.c:835
> 835 if (create_perf_stat_counter(counter, &stat_config, &target,
> (gdb) p evsel_list->core.entries
> $6 = {next = 0xda9580, prev = 0xda9580}
> (gdb)
>
> I.e. it gets confused with its end condition when
> evlist->core.nr_entries is zero.
>
> Can you take a look at improving this? I'm going AFK now, so just wanted
> to dump what I found so far.
>
> I guess the problem got introduced in:
>
> commit 472832d2c000b9611feaea66fe521055c3dbf17a
> Author: Ian Rogers <irogers@xxxxxxxxxx>
> Date: Tue Jan 4 22:13:37 2022 -0800
>
> perf evlist: Refactor evlist__for_each_cpu()
>
> Previously evlist__for_each_cpu() needed to iterate over the evlist in
> an inner loop and call "skip" routines. Refactor this so that the
> iteratr is smarter and the next function can update both the current CPU
> and evsel.
>
> By using a cpu map index, fix apparent off-by-1 in __run_perf_stat's
> call to perf_evsel__close_cpu().
>
> -----------------
>
> And was dormant as the affinity was being always setup, which we don't
> want when it isn't needed.

Thanks for the detailed report, I sent a fix in:
https://lore.kernel.org/lkml/20220316071049.2368250-1-irogers@xxxxxxxxxx/
Any thoughts for a shell test to avoid this regressing again in the future?

Thanks,
Ian

> I'm trying to use:
>
> perf trace --summary perf stat --null sleep 1
>
> to see what is that perf is unconditionally doing that should be
> deferred till it is really needed, for now I'll continue with:
>
> [root@five ~]# perf trace -s perf stat -e dummy sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 0 dummy
>
> 1.000945692 seconds time elapsed
>
> 0.000000000 seconds user
> 0.000861000 seconds sys
>
>
>
> Summary of events:
>
> perf (900261), 3127 events, 94.7%
>
> syscall calls errors total min avg max stddev
> (msec) (msec) (msec) (msec) (%)
> --------------- -------- ------ -------- --------- --------- --------- ------
> wait4 1 0 1000.903 1000.903 1000.903 1000.903 0.00%
> openat 364 187 1.144 0.002 0.003 0.020 1.90%
> mmap 263 0 0.796 0.001 0.003 0.015 2.60%
> clone 1 0 0.620 0.620 0.620 0.620 0.00%
> read 263 0 0.441 0.001 0.002 0.094 21.06%
> access 80 49 0.277 0.002 0.003 0.006 2.75%
> mprotect 80 0 0.269 0.002 0.003 0.006 3.15%
> newfstatat 174 1 0.210 0.001 0.001 0.007 3.57%
> close 183 0 0.171 0.001 0.001 0.003 1.64%
> getdents64 35 0 0.065 0.001 0.002 0.012 17.24%
> write 18 0 0.052 0.001 0.003 0.009 15.24%
> brk 12 0 0.034 0.001 0.003 0.005 11.27%
> ioctl 29 29 0.029 0.001 0.001 0.003 8.38%
> futex 20 0 0.026 0.001 0.001 0.002 8.30%
> perf_event_open 1 0 0.016 0.016 0.016 0.016 0.00%
> statfs 3 0 0.011 0.002 0.004 0.006 38.92%
> rt_sigaction 8 0 0.009 0.001 0.001 0.002 12.76%
> pread64 6 0 0.006 0.001 0.001 0.001 1.44%
> pipe 2 0 0.006 0.002 0.003 0.004 32.45%
> rt_sigprocmask 4 0 0.006 0.001 0.001 0.002 22.35%
> prctl 6 4 0.005 0.001 0.001 0.001 6.86%
> munmap 1 0 0.004 0.004 0.004 0.004 0.00%
> kill 1 0 0.003 0.003 0.003 0.003 0.00%
> arch_prctl 2 1 0.002 0.001 0.001 0.001 11.96%
> sysinfo 1 0 0.002 0.002 0.002 0.002 0.00%
> sched_getaffinity 1 0 0.001 0.001 0.001 0.001 0.00%
> getpid 1 0 0.001 0.001 0.001 0.001 0.00%
> fcntl 1 0 0.001 0.001 0.001 0.001 0.00%
> set_tid_address 1 0 0.001 0.001 0.001 0.001 0.00%
> prlimit64 1 0 0.001 0.001 0.001 0.001 0.00%
> set_robust_list 1 0 0.001 0.001 0.001 0.001 0.00%
> execve 1 0 0.000 0.000 0.000 0.000 0.00%
>
>
> sleep (900262), 107 events, 3.2%
>
> syscall calls errors total min avg max stddev
> (msec) (msec) (msec) (msec) (%)
> --------------- -------- ------ -------- --------- --------- --------- ------
> clock_nanosleep 1 0 1000.050 1000.050 1000.050 1000.050 0.00%
> execve 8 7 0.282 0.008 0.035 0.198 66.26%
> read 2 0 0.058 0.002 0.029 0.056 94.56%
> mmap 9 0 0.027 0.002 0.003 0.005 9.92%
> mprotect 4 0 0.015 0.003 0.004 0.005 10.85%
> close 8 0 0.013 0.001 0.002 0.003 13.67%
> prctl 1 0 0.012 0.012 0.012 0.012 0.00%
> openat 3 0 0.010 0.003 0.003 0.004 4.99%
> arch_prctl 2 1 0.006 0.001 0.003 0.005 68.42%
> brk 3 0 0.006 0.001 0.002 0.002 21.26%
> access 1 1 0.005 0.005 0.005 0.005 0.00%
> munmap 1 0 0.005 0.005 0.005 0.005 0.00%
> newfstatat 3 0 0.005 0.001 0.002 0.002 8.55%
> pread64 4 0 0.005 0.001 0.001 0.001 5.39%
> rt_sigaction 1 0 0.002 0.002 0.002 0.002 0.00%
> fcntl 1 0 0.002 0.002 0.002 0.002 0.00%
> set_robust_list 1 0 0.002 0.002 0.002 0.002 0.00%
> clone 1 0 0.000 0.000 0.000 0.000 0.00%
>
>
> [root@five ~]#
>
> - Arnaldo