[RFC PATCH 00/25] Perf stat metric grouping with hardware information

From: weilin . wang
Date: Mon Sep 25 2023 - 02:19:12 EST


From: Weilin Wang <weilin.wang@xxxxxxxxx>

Perf stat metric grouping generates event groups that are provided to kernel for
data collection using the hardware counters. Sometimes, the grouping might fail
and kernel has to retry the groups because generated groups do not fit in the
hardware counters correctly. In some other cases, the groupings are collected
correctly, however, they left some hardware counters unused.

To improve these inefficiencies, we would like to propose a hardware aware
grouping method that does metric/event grouping based on event counter
restriction rules and the availability of hardware counters in the system. This
method is generic as long as all the restriction rules could be provided from
the pmu-event JSON files.

This patch set includes code that does hardware aware grouping and updated
pmu-event JSON files for four platforms (SapphireRapids, Icelakex, Cascadelakex,
and Tigerlake) for your testing and experimenting. We've successfully tested
these patches on three platforms (SapphireRapids, Icelakex, and Cascadelakex)
with topdown metrics from TopdownL1 to TopdownL6.

There are some optimization opportunities that we might implement in the future:
1) Better NMI hanlding: when NMI watchdog is enabled, we reduce the default_core
total counter size by one. This could be improved to better utilize the counter.
2) Fill important events into unused counter for better counter utlization:
there might be some unused counters scattered in the groups. We could consider
to add important events in this slots if necessary. This could help increase the
multiplexing percentage and help improve accuracy if the event is critical.

Remaining questions for dicussion:
3) Where to start grouping from? The current implementation start grouping by
combining all the events into a single list. This step deduplicates events. But
it does not maintain the relationship of events according to the metrics, i.e.
events required by one metric may not be collected at the same time. Another
type of starting point would be grouping each individual metric and then try to
merge the groups.
4) Any comments, suggestions, new ideas?
5) If you are interested to test the patch out and the pmu-event JSON files of
your testing platform is not provided here, please let me know so that I could
provide you the files.


Weilin Wang (25):
perf stat: Add hardware-grouping cmd option to perf stat
perf stat: Add basic functions for the hardware-grouping stat cmd
option
perf pmu-events: Add functions in jevent.py
perf pmu-events: Add counter info into JSON files for SapphireRapids
perf pmu-events: Add event counter data for Cascadelakex
perf pmu-events: Add event counter data for Icelakex
perf stat: Add helper functions for hardware-grouping method
perf stat: Add functions to get counter info
perf stat: Add helper functions for hardware-grouping method
perf stat: Add helper functions to hardware-grouping method
perf stat: Add utility functions to hardware-grouping method
perf stat: Add more functions for hardware-grouping method
perf stat: Add functions to hardware-grouping method
perf stat: Add build string function and topdown events handling in
hardware-grouping
perf stat: Add function to combine metrics for hardware-grouping
perf stat: Update keyword core to default_core to adjust to the
changes for events with no unit
perf stat: Handle taken alone in hardware-grouping
perf stat: Handle NMI in hardware-grouping
perf stat: Handle grouping method fall back in hardware-grouping
perf stat: Code refactoring in hardware-grouping
perf stat: Add tool events support in hardware-grouping
perf stat: Add TSC support in hardware-grouping
perf stat: Fix a return error issue in hardware-grouping
perf stat: Add check to ensure correctness in platform that does not
support hardware-grouping
perf pmu-events: Add event counter data for Tigerlake

tools/lib/bitmap.c | 20 +
tools/perf/builtin-stat.c | 7 +
.../arch/x86/cascadelakex/cache.json | 1237 ++++++++++++
.../arch/x86/cascadelakex/counter.json | 17 +
.../arch/x86/cascadelakex/floating-point.json | 16 +
.../arch/x86/cascadelakex/frontend.json | 68 +
.../arch/x86/cascadelakex/memory.json | 751 ++++++++
.../arch/x86/cascadelakex/other.json | 168 ++
.../arch/x86/cascadelakex/pipeline.json | 102 +
.../arch/x86/cascadelakex/uncore-cache.json | 1138 +++++++++++
.../x86/cascadelakex/uncore-interconnect.json | 1272 +++++++++++++
.../arch/x86/cascadelakex/uncore-io.json | 394 ++++
.../arch/x86/cascadelakex/uncore-memory.json | 509 +++++
.../arch/x86/cascadelakex/uncore-power.json | 25 +
.../arch/x86/cascadelakex/virtual-memory.json | 28 +
.../pmu-events/arch/x86/icelakex/cache.json | 98 +
.../pmu-events/arch/x86/icelakex/counter.json | 17 +
.../arch/x86/icelakex/floating-point.json | 13 +
.../arch/x86/icelakex/frontend.json | 55 +
.../pmu-events/arch/x86/icelakex/memory.json | 53 +
.../pmu-events/arch/x86/icelakex/other.json | 52 +
.../arch/x86/icelakex/pipeline.json | 92 +
.../arch/x86/icelakex/uncore-cache.json | 965 ++++++++++
.../x86/icelakex/uncore-interconnect.json | 1667 +++++++++++++++++
.../arch/x86/icelakex/uncore-io.json | 966 ++++++++++
.../arch/x86/icelakex/uncore-memory.json | 186 ++
.../arch/x86/icelakex/uncore-power.json | 26 +
.../arch/x86/icelakex/virtual-memory.json | 22 +
.../arch/x86/sapphirerapids/cache.json | 104 +
.../arch/x86/sapphirerapids/counter.json | 17 +
.../x86/sapphirerapids/floating-point.json | 25 +
.../arch/x86/sapphirerapids/frontend.json | 98 +-
.../arch/x86/sapphirerapids/memory.json | 44 +
.../arch/x86/sapphirerapids/other.json | 40 +
.../arch/x86/sapphirerapids/pipeline.json | 118 ++
.../arch/x86/sapphirerapids/uncore-cache.json | 534 +++++-
.../arch/x86/sapphirerapids/uncore-cxl.json | 56 +
.../sapphirerapids/uncore-interconnect.json | 476 +++++
.../arch/x86/sapphirerapids/uncore-io.json | 373 ++++
.../x86/sapphirerapids/uncore-memory.json | 391 ++++
.../arch/x86/sapphirerapids/uncore-power.json | 24 +
.../x86/sapphirerapids/virtual-memory.json | 20 +
.../pmu-events/arch/x86/tigerlake/cache.json | 65 +
.../arch/x86/tigerlake/counter.json | 7 +
.../arch/x86/tigerlake/floating-point.json | 13 +
.../arch/x86/tigerlake/frontend.json | 56 +
.../pmu-events/arch/x86/tigerlake/memory.json | 31 +
.../pmu-events/arch/x86/tigerlake/other.json | 4 +
.../arch/x86/tigerlake/pipeline.json | 96 +
.../x86/tigerlake/uncore-interconnect.json | 11 +
.../arch/x86/tigerlake/uncore-memory.json | 6 +
.../arch/x86/tigerlake/uncore-other.json | 1 +
.../arch/x86/tigerlake/virtual-memory.json | 20 +
tools/perf/pmu-events/jevents.py | 179 +-
tools/perf/pmu-events/pmu-events.h | 26 +-
tools/perf/util/metricgroup.c | 927 +++++++++
tools/perf/util/metricgroup.h | 82 +
tools/perf/util/pmu.c | 5 +
tools/perf/util/pmu.h | 1 +
tools/perf/util/stat.h | 1 +
60 files changed, 13790 insertions(+), 25 deletions(-)
create mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/counter.json
create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/counter.json
create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/counter.json
create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/counter.json

--
2.39.3