Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads

From: Alexey Budankov
Date: Wed Sep 12 2018 - 04:27:42 EST

Next message: Sahitya Tummala: "Re: [f2fs-dev] [PATCH v2] f2fs: add new idle interval timing for discard and gc paths"
Previous message: CÃdric Le Goater: "Re: [PATCH i2c-next v6] i2c: aspeed: Handle master/slave combined irq events properly"
In reply to: Peter Zijlstra: "Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On 11.09.2018 17:19, Peter Zijlstra wrote:
> On Tue, Sep 11, 2018 at 08:35:12AM +0200, Ingo Molnar wrote:
>>> Well, explicit threading in the tool for AIO, in the simplest case, means
>>> incorporating some POSIX API implementation into the tool, avoiding
>>> code reuse in the first place. That tends to be error prone and costly.
>>
>> It's a core competency, we better do it right and not outsource it.
>>
>> Please take a look at Jiri's patches (once he re-posts them), I think it's a very good
>> starting point.
>
> There's another reason for doing custom per-cpu threads; it avoids
> bouncing the buffer memory around the machine. If the task doing the
> buffer reads is the exact same as the one doing the writes, there's less
> memory traffic on the interconnects.

Yeah, NUMA does matter. Memory locality, i.e. cache sizes and NUMA domains
for kernel/user buffers allocation, needs to be taken into account by the
effective solution. Luckily data losses hasn't been observed when testing
matrix multiplication on 96 core dual socket machines.

>
> Also, I think we can avoid the MFENCE in that case, but I'm not sure
> that one is hot enough to bother about on the perf reading side of
> things.

Yep, *FENCE may be costly in HW, especially on larger scale.

>

Thanks,
Alexey

Next message: Sahitya Tummala: "Re: [f2fs-dev] [PATCH v2] f2fs: add new idle interval timing for discard and gc paths"
Previous message: CÃdric Le Goater: "Re: [PATCH i2c-next v6] i2c: aspeed: Handle master/slave combined irq events properly"
In reply to: Peter Zijlstra: "Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]