Re: [PATCH v3 00/23] Improvements to Intel perf metrics

From: Ian Rogers
Date: Tue Oct 04 2022 - 13:56:22 EST


On Tue, Oct 4, 2022 at 10:29 AM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>
> [cutting down cc list]
>
>
> On 10/3/2022 8:43 PM, Ian Rogers wrote:
> > On Mon, Oct 3, 2022 at 7:16 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >> For consistency with:
> >> https://github.com/intel/perfmon-metrics
> >> rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.
> >>
> >> Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
> >> are correctly expanded in the single main metric. Fix perf expr to
> >> allow a double if to be correctly processed.
> >>
> >> Add all 6 levels of TMA metrics. Child metrics are placed in a group
> >> named after their parent allowing children of a metric to be
> >> easily measured using the metric name with a _group suffix.
> >>
> >> Don't drop TMA metrics if they contain topdown events.
> >>
> >> The ## and ##? operators are correctly expanded.
> >>
> >> The locate-with column is added to the long description describing a
> >> sampling event.
> >>
> >> Metrics are written in terms of other metrics to reduce the expression
> >> size and increase readability.
> >>
> >> Following this the pmu-events/arch/x86 directories match those created
> >> by the script at:
> >> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> >> with updates at:
> >> https://github.com/captain5050/event-converter-for-linux-perf
> >>
> >>
> >> v3. Fix a parse metrics test failure due to making metrics referring
> >> to other metrics case sensitive - make the cases in the test
> >> metric match.
> >> v2. Fixes commit message wrt missing mapfile.csv updates as noted by
> >> Zhengjun Xing <zhengjun.xing@xxxxxxxxxxxxxxx>. ScaleUnit is added
> >> for TMA metrics. Metrics with topdown events have have a missing
> >> slots event added if necessary. The latest metrics at:
> >> https://github.com/intel/perfmon-metrics are used, however, the
> >> event-converter-for-linux-perf scripts now prefer their own
> >> metrics in case of mismatched units when a metric is written in
> >> terms of another. Additional testing was performed on broadwell,
> >> broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
> >> CPUs.
> > I wrote up a little example of performing a top-down analysis for the
> > perf wiki here:
> > https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
>
>
> I did some quick testing.
>
> On Skylake the output of L1 isn't scaled to percent:
>
> $ ./perf stat -M TopdownL1 ~/pmu/pmu-tools/workloads/BC1s
>
> Performance counter stats for '/home/ak/pmu/pmu-tools/workloads/BC1s':
>
> 608,066,701 INT_MISC.RECOVERY_CYCLES # 0.32
> Bad_Speculation (50.02%)
> 5,364,230,382 CPU_CLK_UNHALTED.THREAD # 0.48
> Retiring (50.02%)
> 10,194,062,626 UOPS_RETIRED.RETIRE_SLOTS (50.02%)
> 14,613,100,390 UOPS_ISSUED.ANY (50.02%)
> 2,928,793,077 IDQ_UOPS_NOT_DELIVERED.CORE # 0.14
> Frontend_Bound
> # 0.07
> Backend_Bound (50.02%)
> 604,850,703 INT_MISC.RECOVERY_CYCLES (50.02%)
> 5,357,291,185 CPU_CLK_UNHALTED.THREAD (50.02%)
> 14,618,285,580 UOPS_ISSUED.ANY (50.02%)

Did you build Arnaldo's perf/core branch with the changes applied? The
metric values here should be tma_bad_speculation, tma_retiring,
tma_frontend_bound, tma_backend_bound.

Looking at:
https://lore.kernel.org/lkml/20221004021612.325521-22-irogers@xxxxxxxxxx/

+ "MetricExpr": "1 - tma_frontend_bound - (UOPS_ISSUED.ANY + 4
* ((INT_MISC.RECOVERY_CYCLES_ANY / 2) if #SMT_on else
INT_MISC.RECOVERY_CYCLES)) / SLOTS",
+ "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricName": "tma_backend_bound",
+ "PublicDescription": "This category represents fraction of
slots where no uops are being delivered due to a lack of required
resources for accepting new uops in the Backend. Backend is the
portion of the processor core where the out-of-order scheduler
dispatches ready uops into their respective execution units; and once
completed these uops get retired according to program order. For
example; stalls due to data-cache misses or stalls due to the divider
unit being overloaded are both categorized under Backend Bound.
Backend Bound is further divided into two main categories: Memory
Bound and Core Bound.",
+ "ScaleUnit": "100%"

So it wouldn't make sense to me that the scale was missing. Fwiw, I
did test on SkylakeX but used Tigerlake for the wiki due to potential
clock domain issues with SLOTS.

> Then if I follow the wiki example here I would expect I need to do
>
> $ ./perf stat -M tma_backend_bound_group ~/pmu/pmu-tools/workloads/BC1s
>
> Cannot find metric or group `tma_backend_bound_group'
>
> but tma_retiring_group doesn't exist. So it seems the methodology isn't
> fully consistent everywhere? Perhaps the wiki needs to document the
> supported CPUs and also what part of the hierarchy is supported.

So I think you've not got Arnaldo's branch with the changes applied.
Unfortunately the instructions around '_group' are only going to apply
to Linux 6.1.

> Another problem I noticed in the example is that the sample event didn't
> specify PEBS, even though it probably should at least on Icelake+ where
> every event can be used with less over with PEBS.

The 'Sample with' text is just text for a description. We can change
it or put something on the wiki, what would you suggest?

> Also with all these groups that need to be specified by hand some bash
> completion support for groups would be really useful)

Ack. My expectation is that everyone starts with TopdownL1 and goes
from there adding '_group' to the metric they want to drill into.
There are 104 topdown metrics and I'm not sure how useful expanding
all of these would be. On Icelake+ this becomes muddy due to the
unconditional printing of topdown metrics in the midst of the
regularly computed metrics, this can be seen on the wiki.
https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
For example, when the level 2 metric group tma_backend_bound_group is
given the level 1 metrics Retiring, Frontend Bound, Backend Bound and
Bad Speculation are displayed.

Thanks,
Ian

> -Andi
>
>