Re: [RFC/PATCH 00/14] perf record: Add support to store data in directory

From: Song Liu
Date: Mon Feb 11 2019 - 15:31:01 EST




> On Feb 11, 2019, at 11:30 AM, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>
> Arnaldo,
>
> On Mon, Feb 11, 2019 at 10:55 AM Arnaldo Carvalho de Melo
> <acme@xxxxxxxxxx> wrote:
>>
>> Em Mon, Feb 11, 2019 at 10:34:16AM -0800, Stephane Eranian escreveu:
>>> Jiri,
>>>
>>> On Mon, Feb 11, 2019 at 2:20 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>>>>
>>>> On Tue, Feb 05, 2019 at 02:37:27PM +0100, Jiri Olsa wrote:
>>>>> On Mon, Feb 04, 2019 at 02:44:37PM -0800, Stephane Eranian wrote:
>>>>>> Jiri,
>>>>>>
>>>>>> While you're looking at the output format, I think it would be good
>>>>>> time to simplify the code handling perf.data file.
>>>>>> Today, perf record can emit in two formats: file mode or pipe mode.
>>>>>> This adds complexity in the code and
>>>>>> is error prone as the file mode path is tested more than the pipe mode
>>>>>> path. We have run into multiple issues with
>>>>>> the pipe mode in recent years. There is no real reason why we need to
>>>>>> maintain two formats. If I recall, the pipe format
>>>>>> was introduced because on pipes you cannot lseek to update the headers
>>>>>> and therefore some of the information present as tables
>>>>>> updated on the fly needed to be generated as pseudo records by the
>>>>>> tool. I believe that the pipe format covers all the needs and could
>>>>>> supersede the file mode format. That would simplify code in perf
>>>>>> record and eliminate the risk of errors when new headers
>>>>>> are introduced.
>>>>>
>>>>> yep, I think we have almost all the features covered for pipe mode,
>>>>> and we have all necessary events to describe events features
>>>>>
>>>>> so with some effort we could switch off the superfluos file header
>>>>> and use only events to describe events ;-) make sense, I'll check
>>>>> on it
>>>>
>>>> so following features are not synthesized:
>>>>
>>>> FEAT_OPN(TRACING_DATA, tracing_data, false),
>>>> FEAT_OPN(BUILD_ID, build_id, false),
>>>> FEAT_OPN(BRANCH_STACK, branch_stack, false),
>>>> FEAT_OPN(AUXTRACE, auxtrace, false),
>>>> FEAT_OPN(STAT, stat, false),
>>>> FEAT_OPN(CACHE, cache, true),
>>>>
>>> What do you need for BRANCH_STACK?
>>>
>>>> I think all could be added and worked around with exception
>>>> of BUILD_ID, which we store at the end (after processing
>>>> all data) and we need it early in the report phase
>>>>
>>> Buildids are injected after the fact via perf inject when in pipe mode.
>>>
>>>> maybe it's time to re-think that buildid -> mmap event
>>>> association again, because it's pain in current implementation
>>>> as well
>>>>
>>> Sure, but what do you propose?
>>
>> this keeps resurfacing, the idea is to have the building go together
>> with the PERF_RECORD_MMAP3 event, i.e. as part of setting up an
>> executable mapping the loader would get the buildid and ask the kernel
>> to keep it aroung, then when a PERF_RECORD_MMAP needs to be issued, it
>> can include the build id, so tooling will not need to get it.
>>
> And how would the dynamic loader (ld.so) communicate the buildid to the kernel?
> How would that work for statically linked binaries.
> I think you're say the kernel would parse the ELF header looking for
> that note section
> and extract the buildid from there. Is that what you are proposing?

We have kernel parses ELF header for BUILD-ID in BPF side. You can
find the code in stack_map_get_build_id_offset() and functions called
by it.

>
>> Alternatively, we would have a separate thread to process
>> PERF_RECORD_MMAP events, and as soon as it gets one from the kernel,
>> augment it straight away with the build-id it reads from the ELF file,
>> i.e. no need to have the kernel provide it, do it just like we do with
>> PERF_RECORD_BPF_EVENT, which reminds me Song probably already posted
>> thise bits...
>>
> But that would not work in pipe mode, wouldn't it?
> Unless that thread intercepts everything pushed to the pipe looking
> for MMAP records.

For PERF_RECORD_BPF_EVENT, I am adding a separate thread, which only
listen to PERF_RECORD_BPF_EVENT with watermark of 1. This means,
each PERF_RECORD_BPF_EVENT is sent to two ring buffers. One of them
got written to the pipe, the other is only processed by the listening
thread. Please see https://patchwork.ozlabs.org/patch/1039091/ for
details.

Thanks,
Song

>
>>>> looks like bpf code is actualy getting build ids and storing
>>>> it for the callchains in kernel.. we can check if we can do
>>>> something similar for mmap events
>>>>
>>>> jirka
>>
>> --
>>
>> - Arnaldo