Re: [RFC PATCH v3 2/2] perf: Add option for --per-cache aggregation

From: K Prateek Nayak
Date: Thu Apr 13 2023 - 12:59:49 EST


Hello Ian,

Thank you for reviewing the series.

On 4/13/2023 10:18 PM, Ian Rogers wrote:
> On Wed, Apr 12, 2023 at 11:22 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>>
>> Processors based on chiplet architecture, such as AMD EPYC and Hygon do
>> not expose the chiplet details in the sysfs CPU topology information.
>> However, this information can be derived from the per CPU cache level
>> information from the sysfs.
>>
>> perf stat has already supported aggregation based on topology
>> information using core ID, socket ID, etc. It'll be useful to aggregate
>> based on the cache topology to detect problems like imbalance and
>> cache-to-cache sharing at various cache levels.
>>
>> This patch adds support for "--per-cache" option for aggregation at a
>> particular cache level. Also update the docs and related test. The
>> output will be like:
>>
>> $ sudo perf stat --per-cache=L3 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S0-D0-L3-ID0 16 12,644,599 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID8 16 13,847,598 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID16 16 223,592 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID24 16 7,775 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID32 16 31,302 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID40 16 17,722 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID48 16 8,272 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID56 16 5,765 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID64 16 18,227,173 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID72 16 20,926,878 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID80 16 13,705 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID88 16 24,062 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID96 16 27,891 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID104 16 13,480 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID112 16 12,656 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID120 16 21,881 ls_dmnd_fills_from_sys.ext_cache_remote
>>
>> Also support perf stat record and perf stat report with the ability to
>> specify a different cache level to aggregate data at when running perf
>> stat report.
>>
>> Suggested-by: Gautham R. Shenoy <gautham.shenoy@xxxxxxx>
>> Signed-off-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
>> ---
>> Changelog:
>> o v2->v3
>> - Cache IDs are now derived from the shared_cpus_list making
>> aggregation possible on older kernels that do not expose the IDs.
>> - Updated the comments based on the new ID assignment strategy.
>> - Better handle the case when specifying a level is possible as it is
>> less than or equal to MAX_CACHE_LVL but it does not exist on the
>> machine. In such cased ID will be -1.
>>
>> $ sudo perf stat --per-cache=L4 -a -e cycles -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S0-D0-L4-ID-1 128 51,328,613 cycles
>> S1-D1-L4-ID-1 128 125,132,221 cycles
>>
>> o v1->v2
>> - Fix cache level parsing logic. Previously, giving "--per-cache=2" was
>> considered valid but that was not the intention. Only parse strings
>> of form LX or lX where x is the cache level and can range from 1 to
>> MAX_CACHE_LVL.
>> ---
>> tools/lib/perf/include/perf/cpumap.h | 5 +
>> tools/lib/perf/include/perf/event.h | 3 +-
>> tools/perf/Documentation/perf-stat.txt | 16 ++
>> tools/perf/builtin-stat.c | 144 +++++++++++++++++-
>> .../tests/shell/lib/perf_json_output_lint.py | 4 +-
>> tools/perf/tests/shell/stat+csv_output.sh | 14 ++
>> tools/perf/tests/shell/stat+json_output.sh | 13 ++
>
> I think you can break this patch apart. You can add the feature, then
> enable the CSV test, then the json test, etc. When adding the feature
> you can also probably split the changes of "struct aggr_cpu_id" from
> the display/enablement logic.

I agree this patch is huge. I'll break it down as per your suggestion in
the next version where I also plan to drop the RFC tag.

> Overall it looks good!

Thank you again for taking a look at the series.

>
> Thanks,
> Ian
>
>> [..snip..]
>>

--
Thanks and Regards,
Prateek