Re: [PATCH V2] libperf evlist: Fix per-thread mmaps for multi-threaded targets

From: Adrian Hunter
Date: Wed Sep 07 2022 - 00:52:18 EST


On 6/09/22 20:50, Namhyung Kim wrote:
> On Mon, Sep 5, 2022 at 4:42 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>>
>> The offending commit removed mmap_per_thread(), which did not consider
>> the different set-output rules for per-thread mmaps i.e. in the per-thread
>> case set-output is used for file descriptors of the same thread not the
>> same cpu.
>>
>> This was not immediately noticed because it only happens with
>> multi-threaded targets and we do not have a test for that yet.
>
> Yeah, this is unfortunate. I feel like I need to spend some time on it.
>
>>
>> Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
>> events i.e. to continue to allow the mixing of per-thread and per-cpu
>> mmaps.
>>
>> Debug messages (with -vv) show the file descriptors that are opened with
>> sys_perf_event_open. New debug messages are added (needs -vvv) that show
>> also which file descriptors are mmapped and which are redirected with
>> set-output.
>>
>> In the per-cpu case (cpu != -1) file descriptors for the same CPU are
>> set-output to the first file descriptor for that CPU.
>>
>> In the per-thread case (cpu == -1) file descriptors for the same thread are
>> set-output to the first file descriptor for that thread.
>>
>> Example (process 17489 has 2 threads):
>>
>> Before (but with new debug prints):
>>
>> $ perf record --no-bpf-event -vvv --per-thread -p 17489
>> <SNIP>
>> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
>> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
>> <SNIP>
>> libperf: idx 0: mmapping fd 5
>> libperf: idx 0: set output fd 6 -> 5
>> failed to mmap with 22 (Invalid argument)
>>
>> After:
>>
>> $ perf record --no-bpf-event -vvv --per-thread -p 17489
>> <SNIP>
>> sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
>> sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
>> <SNIP>
>> libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
>> libperf: idx 0: mmapping fd 5
>> libperf: idx 1: mmapping fd 6
>> <SNIP>
>> [ perf record: Woken up 2 times to write data ]
>> [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
>
> It'd be nice if the example had 2 events so that it could check the
> set-output rule actually worked.

Here is an Intel PT example

$ perf record -vvv --per-thread -p 1521 -e intel_pt//
Using CPUID GenuineIntel-6-8C-1
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
DEBUGINFOD_URLS=
nr_cblocks: 0
affinity: SYS
mmap flush: 1
comp level: 0
------------------------------------------------------------
perf_event_attr:
type 9
size 128
config 0x300e601
{ sample_period, sample_freq } 1
sample_type IP|TID|IDENTIFIER
read_format ID
disabled 1
sample_id_all 1
exclude_guest 1
aux_watermark 1048576
------------------------------------------------------------
sys_perf_event_open: pid 1521 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 1522 cpu -1 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 1
size 128
config 0x9
{ sample_period, sample_freq } 1
sample_type IP|TID|IDENTIFIER
read_format ID
disabled 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 1521 cpu -1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 1522 cpu -1 group_fd -1 flags 0x8 = 8
------------------------------------------------------------
perf_event_attr:
type 1
size 128
config 0x9
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|IDENTIFIER
read_format ID
exclude_kernel 1
exclude_hv 1
sample_id_all 1
exclude_guest 1
ksymbol 1
text_poke 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 16
mmap size 528384B
AUX area mmap length 4194304
libperf: mmap_per_thread: nr cpu values (may include -1) 9 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 7 -> 5
libperf: idx 1: mmapping fd 6
libperf: idx 1: set output fd 8 -> 6
libperf: idx 2: mmapping fd 9
libperf: idx 3: mmapping fd 10
libperf: idx 4: mmapping fd 11
libperf: idx 5: mmapping fd 12
libperf: idx 6: mmapping fd 13
libperf: idx 7: mmapping fd 14
libperf: idx 8: mmapping fd 15
libperf: idx 9: mmapping fd 16
Control descriptor is not initialized
thread_data[0x5572a73ec2f0]: nr_mmaps=10, maps=0x5572a73ec380, ow_maps=(nil)
thread_data[0x5572a73ec2f0]: cpu-1: maps[0] -> mmap[0]
thread_data[0x5572a73ec2f0]: cpu0: maps[1] -> mmap[1]
thread_data[0x5572a73ec2f0]: cpu1: maps[2] -> mmap[2]
thread_data[0x5572a73ec2f0]: cpu2: maps[3] -> mmap[3]
thread_data[0x5572a73ec2f0]: cpu3: maps[4] -> mmap[4]
thread_data[0x5572a73ec2f0]: cpu4: maps[5] -> mmap[5]
thread_data[0x5572a73ec2f0]: cpu5: maps[6] -> mmap[6]
thread_data[0x5572a73ec2f0]: cpu6: maps[7] -> mmap[7]
thread_data[0x5572a73ec2f0]: cpu7: maps[8] -> mmap[8]
thread_data[0x5572a73ec2f0]: cpu-1: maps[9] -> mmap[9]
thread_data[0x5572a73ec2f0]: pollfd[0] <- event_fd=5
thread_data[0x5572a73ec2f0]: pollfd[1] <- event_fd=7
thread_data[0x5572a73ec2f0]: pollfd[2] <- event_fd=6
thread_data[0x5572a73ec2f0]: pollfd[3] <- event_fd=8
thread_data[0x5572a73ec2f0]: pollfd[4] <- non_perf_event fd=4
------------------------------------------------------------
perf_event_attr:
type 1
size 128
config 0x9
watermark 1
sample_id_all 1
bpf_event 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1521 cpu -1 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 1522 cpu -1 group_fd -1 flags 0x8 = 18
mmap size 528384B
libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
libperf: idx 0: mmapping fd 17
libperf: idx 1: mmapping fd 18
Synthesizing TSC conversion information
Synthesizing id index
Synthesizing auxtrace information
auxtrace idx 1 old 0 head 0x2c0 diff 0x2c0
auxtrace idx 0 old 0 head 0x2060 diff 0x2060
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.029 MB perf.data ]




>
> Thanks,
> Namhyung
>
>>
>> Per-cpu example (process 20341 has 2 threads, same as above):
>>
>> $ perf record --no-bpf-event -vvv -p 20341
>> <SNIP>
>> sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
>> sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
>> sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
>> sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
>> sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
>> sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
>> sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
>> sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
>> sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
>> sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
>> sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
>> sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
>> sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
>> sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
>> sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
>> sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
>> <SNIP>
>> libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
>> libperf: idx 0: mmapping fd 5
>> libperf: idx 0: set output fd 6 -> 5
>> libperf: idx 1: mmapping fd 7
>> libperf: idx 1: set output fd 8 -> 7
>> libperf: idx 2: mmapping fd 9
>> libperf: idx 2: set output fd 10 -> 9
>> libperf: idx 3: mmapping fd 11
>> libperf: idx 3: set output fd 12 -> 11
>> libperf: idx 4: mmapping fd 13
>> libperf: idx 4: set output fd 14 -> 13
>> libperf: idx 5: mmapping fd 15
>> libperf: idx 5: set output fd 16 -> 15
>> libperf: idx 6: mmapping fd 17
>> libperf: idx 6: set output fd 18 -> 17
>> libperf: idx 7: mmapping fd 19
>> libperf: idx 7: set output fd 20 -> 19
>> <SNIP>
>> [ perf record: Woken up 7 times to write data ]
>> [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
>>
>> Fixes: ae4f8ae16a07 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
>> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>