Re: [PATCH v11 perf, bpf-next 7/9] perf tools: synthesize PERF_RECORD_* for loaded BPF programs

From: Song Liu
Date: Tue Jan 22 2019 - 13:45:15 EST




> On Jan 22, 2019, at 7:21 AM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> On Tue, Jan 22, 2019 at 12:58:05PM -0200, Arnaldo Carvalho de Melo wrote:
>> Em Tue, Jan 22, 2019 at 03:51:19PM +0100, Jiri Olsa escreveu:
>>> On Tue, Jan 22, 2019 at 12:31:17PM -0200, Arnaldo Carvalho de Melo wrote:
>>>> Em Tue, Jan 22, 2019 at 12:13:20PM -0200, Arnaldo Carvalho de Melo escreveu:
>>>>> Em Fri, Jan 18, 2019 at 11:46:55AM -0300, Arnaldo Carvalho de Melo escreveu:
>>>>>> Em Thu, Jan 17, 2019 at 08:15:19AM -0800, Song Liu escreveu:
>>>>>>> This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for
>>>>>>> BPF programs loaded before perf-record. This is achieved by gathering
>>>>>>> information about all BPF programs via sys_bpf.
>>>>>>
>>>>>> Ditto
>>>>>
>>>>> This is breaking 'perf sched', see below, the fix seems trivial:
>>>>>
>>>>> [root@quaco ~]# perf sched record -a sleep 2
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> 0x5b60 [0x138]: failed to process type: 17
>>>>> [ perf record: Captured and wrote 1.539 MB perf.data ]
>>>>> [root@quaco ~]# perf sched lat
>>>>> 0x5b60 [0x138]: failed to process type: 17
>>>>> Failed to process events, error -22
>>>>> [root@quaco ~]#
>>>>
>>>> So:
>>>>
>>>> perf_session__process_event (event->header.type = 17 (PERF_RECORD_KSYMBOL)
>>>> if (tool->ordered_events)
>>>> ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
>>>> if (ret && ret != -1)
>>>> return ret;
>>>>
>>>> So it returns here with -EFAULT, i.e. this is failing:
>>>>
>>>> int perf_evlist__parse_sample_timestamp(struct perf_evlist *evlist,
>>>> union perf_event *event,
>>>> u64 *timestamp)
>>>> {
>>>> struct perf_evsel *evsel = perf_evlist__event2evsel(evlist, event);
>>>>
>>>> if (!evsel)
>>>> return -EFAULT;
>>>> return perf_evsel__parse_sample_timestamp(evsel, event, timestamp);
>>>> }
>>>>
>>>> It isn't mapping the event ID it finds back to an evsel.. Jiri, ideas?
>>>>
>>>> This is happening so far only for 'perf sched', perf record with two
>>>> events works.
>>>
>>> I saw also perf mem failing because of this.. will check
>>
>> Right, seems something with the synthesizing of existing bpf progs,
>> which always there are some nowadays, for instance, on this fedora29
>> system:
>>
>> [root@quaco tmp]# bpftool prog
>> 13: cgroup_skb tag 7be49e3934a125ba gpl
>> loaded_at 2019-01-21T13:30:27-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 13,14
>> 14: cgroup_skb tag 2a142ef67aaad174 gpl
>> loaded_at 2019-01-21T13:30:27-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 13,14
>> 15: cgroup_skb tag 7be49e3934a125ba gpl
>> loaded_at 2019-01-21T13:30:27-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 15,16
>> 16: cgroup_skb tag 2a142ef67aaad174 gpl
>> loaded_at 2019-01-21T13:30:27-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 15,16
>> 17: cgroup_skb tag 7be49e3934a125ba gpl
>> loaded_at 2019-01-21T13:30:29-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 17,18
>> 18: cgroup_skb tag 2a142ef67aaad174 gpl
>> loaded_at 2019-01-21T13:30:29-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 17,18
>> 21: cgroup_skb tag 7be49e3934a125ba gpl
>> loaded_at 2019-01-21T13:30:29-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 21,22
>> 22: cgroup_skb tag 2a142ef67aaad174 gpl
>> loaded_at 2019-01-21T13:30:29-0200 uid 0
>> xlated 296B jited 229B memlock 4096B map_ids 21,22
>> [root@quaco tmp]#
>>
>> When with a bunch of tracepoints, that is what 'perf mem', 'perf kmem',
>> 'perf sched', etc does, it ends up failing here:
>>
>> ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
>>
>> So it is not passed to
>>
>> ret = perf_session__queue_event(session, event, timestamp, file_offset);
>>
>> in perf_session__process_event, this happens right when processing
>> buildids in 'perf record', and also in 'perf report', so that is
>> something badly synthesized that hits perf.data for PERF_RECORD_KSYMBOL.
>
> it's reproducible with simple:
> perf record -e cycles,instructions ls
>
> as you said on irc, it's the machine->id_hdr_size size missing
> there's one more glitch, attached patch fixes that for me
> you can't use sizeof(struct ksymbol_event), because it includes
> the name as well.. which screws the size
>
> but I don't know that code that much.. might be still something
> missing
>
> jirka

Hi Arnaldo and Jiri,

Thanks for catching and fixing the bug.

I guess the following is OK?

*bpf_event = (struct bpf_event){
.header = {
.type = PERF_RECORD_BPF_EVENT,
.size = sizeof(struct bpf_event),
},
.type = PERF_BPF_EVENT_PROG_LOAD,
.flags = 0,
.id = info.id,
};

as struct bpf_event doesn't have variable length name:

struct bpf_event {
struct perf_event_header header;
u16 type;
u16 flags;
u32 id;

/* for bpf_prog types */
u8 tag[BPF_TAG_SIZE]; // prog tag
};

Or we need similar fix?

Song