Re: perf/x86/intel: Collecting CPU-local performance counters from all cores in parallel

From: Andi Kleen
Date: Tue May 23 2017 - 16:54:00 EST


Michael Edwards <michael@xxxxxxxxxx> writes:
>
> Am I going about this wrong?

It seems like a reasonable optimization, but it's likely a lot of work.

> Is there some better way to pursue the
> high-level goal of gathering PMC-based statistics frequently and
> efficiently from all cores, without breaking everything else that uses
> perf_events?

If you can drive the collection from a performance counter
(e.g. reference cycles) you could use leader sampling, and let the
PMIs log the values to the mmap'ed ring buffer. This should
be vastly more efficient than pulling everything. This works today,
however there are some scaling problems with many groups still.

perf record -F frequency -e '{cpu/ref-cycles/,<three other
events to collect>}:S,... more groups like this ... -a ...

-Andi