Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads

From: Alexey Budankov
Date: Tue Sep 11 2018 - 09:42:16 EST

Next message: Gerd Hoffmann: "[PATCH v2 06/13] udmabuf: add MEMFD_CREATE dependency"
Previous message: Jerome Glisse: "Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive"
In reply to: Jiri Olsa: "Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads"
Next in thread: Jiri Olsa: "Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On 11.09.2018 11:34, Jiri Olsa wrote:
> On Tue, Sep 11, 2018 at 11:16:45AM +0300, Alexey Budankov wrote:
>>
>> Hi Ingo,
>>
>> On 11.09.2018 9:35, Ingo Molnar wrote:
>>>
>>> * Alexey Budankov <alexey.budankov@xxxxxxxxxxxxxxx> wrote:
>>>
>>>> It may sound too optimistic but glibc API is expected to be backward compatible
>>>> and for POSIX AIO API part too. Internal implementation also tends to evolve to
>>>> better option overtime, more probably basing on modern kernel capabilities
>>>> mentioned here: http://man7.org/linux/man-pages/man2/io_submit.2.html
>>>
>>> I'm not talking about compatibility, and I'm not just talking about glibc, perf works under
>>> other libcs as well - and let me phrase it in another way: basic event handling, threading,
>>> scheduling internals should be a *core competency* of a tracing/profiling tool.
>>
>> Well, the requirement of independence from some specific libc implementation
>> as well as *core competency* design approach clarify a lot. Thanks!
>>
>>>
>>> I.e. we might end up using the exact same per event fd thread pool design that glibc uses
>>> currently. Or not. Having that internal and open coded to perf, like Jiri has started
>>> implementing it, allows people to experiment with it.
>>
>> My point here is that following some standardized programming models and APIs
>> (like POSIX) in the tool code, even if the tool itself provides internal open
>> coded implementation for the APIs, would simplify experimenting with the tool
>> as well as lower barriers for new comers. Perf project could benefit from that.
>>
>>>
>>> This isn't some GUI toolkit, this is at the essence of perf, and we are not very good on large
>>> systems right now, and I think the design should be open-coded threading, not relying on an
>>> (perf-)external AIO library to get it right.
>>>
>>> The glibc thread pool implementation of POSIX AIO is basically a fall-back
>>> implementation, for the case where there's no native KAIO interface to rely on.
>>>
>>>> Well, explicit threading in the tool for AIO, in the simplest case, means
>>>> incorporating some POSIX API implementation into the tool, avoiding
>>>> code reuse in the first place. That tends to be error prone and costly.
>>>
>>> It's a core competency, we better do it right and not outsource it.
>>
>> Yep, makes sense.
>
> on the other hand, we are already trying to tie this up under perf_mmap
> object, which is what the threaded patchset operates on.. so I'm quite
> confident that with little effort we could make those 2 things live next
> to each other and let the user to decide which one to take and compare
>
> possibilities would be like: (not sure yet the last one makes sense, but still..)
>
> # perf record --threads=... ...
> # perf record --aio ...
> # perf record --threads=... --aio ...
>
> how about that?

That might be an option. What is the semantics of --threads?

Be aware that when experimenting with serial trace writing on an 8-core
client machines running an HPC benchmark heavily utilizing all 8 cores
we noticed that single Perf tool thread contended with the benchmark
threads.

That manifested like libiomp.so (Intel OpenMP implementation) functions
appearing among the top hotspots functions and this was indication of
imbalance induced by the tool during profiling.

That's why we decided to first go with AIO approach, as it is posted,
and benefit from it the most thru multi AIO, prior turning to more
resource consuming multi-threading alternative.

>
> I just rebased the thread patchset, will make some tests (it's been few months,
> so it needs some kicking/checking) and post it out hopefuly this week>
> jirka
>

Next message: Gerd Hoffmann: "[PATCH v2 06/13] udmabuf: add MEMFD_CREATE dependency"
Previous message: Jerome Glisse: "Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive"
In reply to: Jiri Olsa: "Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads"
Next in thread: Jiri Olsa: "Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]