Re: [PATCH v1 04/20] perf jevents: Add tsx metric group for Intel models

From: Ian Rogers
Date: Fri Mar 01 2024 - 11:38:42 EST


On Fri, Mar 1, 2024 at 6:52 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2024-02-29 8:01 p.m., Ian Rogers wrote:
> > On Thu, Feb 29, 2024 at 1:15 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 2024-02-28 7:17 p.m., Ian Rogers wrote:
> >>> Allow duplicated metric to be dropped from json files.
> >>>
> >>> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> >>> ---
> >>> tools/perf/pmu-events/intel_metrics.py | 51 ++++++++++++++++++++++++++
> >>> 1 file changed, 51 insertions(+)
> >>>
> >>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> >>> index 20c25d142f24..1096accea2aa 100755
> >>> --- a/tools/perf/pmu-events/intel_metrics.py
> >>> +++ b/tools/perf/pmu-events/intel_metrics.py
> >>> @@ -7,6 +7,7 @@ import argparse
> >>> import json
> >>> import math
> >>> import os
> >>> +from typing import Optional
> >>>
> >>> parser = argparse.ArgumentParser(description="Intel perf json generator")
> >>> parser.add_argument("-metricgroups", help="Generate metricgroups data", action='store_true')
> >>> @@ -77,10 +78,60 @@ def Smi() -> MetricGroup:
> >>> ])
> >>>
> >>>
> >>> +def Tsx() -> Optional[MetricGroup]:
> >>> + if args.model not in [
> >>> + 'alderlake',
> >>> + 'cascadelakex',
> >>> + 'icelake',
> >>> + 'icelakex',
> >>> + 'rocketlake',
> >>> + 'sapphirerapids',
> >>> + 'skylake',
> >>> + 'skylakex',
> >>> + 'tigerlake',> + ]:
> >>
> >> Can we get ride of the model list? Otherwise, we have to keep updating
> >> the list.
> >
> > Do we expect the list to update? :-)
>
> Yes, at least for the meteorlake and graniterapids. They should be the
> same as alderlake and sapphirerapids. I'm not sure about the future
> platforms.
>
> Maybe we can have a if args.model in list here to include all the
> non-hybrid models which doesn't support TSX. I think the list should not
> be changed shortly.
>
> > The issue is the events are in
> > sysfs and not the json. If we added the tsx events to json then this
> > list wouldn't be necessary, but it also would mean the events would be
> > present in "perf list" even when TSX is disabled.
>
> I think there may an alternative way, to check the RTM events, e.g.,
> RTM_RETIRED.START event. We only need to generate the metrics for the
> platform which supports the RTM_RETIRED.START event.
>
>
> >
> >>> + return None
> >>> +> + pmu = "cpu_core" if args.model == "alderlake" else "cpu"
> >>
> >> Is it possible to change the check to the existence of the "cpu" PMU
> >> here? has_pmu("cpu") ? "cpu" : "cpu_core"
> >
> > The "Unit" on "cpu" events in json always just blank. On hybrid it is
> > either "cpu_core" or "cpu_atom", so I can make this something like:
> >
> > pmu = "cpu_core" if metrics.HasPmu("cpu_core") else "cpu"
> >
> > which would be a build time test.
>
> Yes, I think using the "Unit" is good enough.
>
> >
> >
> >>> + cycles = Event('cycles')
> >>> + cycles_in_tx = Event(f'{pmu}/cycles\-t/')
> >>> + transaction_start = Event(f'{pmu}/tx\-start/')
> >>> + cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
> >>> + metrics = [
> >>> + Metric('tsx_transactional_cycles',
> >>> + 'Percentage of cycles within a transaction region.',
> >>> + Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
> >>> + '100%'),
> >>> + Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted transactions.',
> >>> + Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
> >>> + has_event(cycles_in_tx),
> >>> + 0),
> >>> + '100%'),
> >>> + Metric('tsx_cycles_per_transaction',
> >>> + 'Number of cycles within a transaction divided by the number of transactions.',
> >>> + Select(cycles_in_tx / transaction_start,
> >>> + has_event(cycles_in_tx),
> >>> + 0),
> >>> + "cycles / transaction"),
> >>> + ]
> >>> + if args.model != 'sapphirerapids':
> >>
> >> Add the "tsx_cycles_per_elision" metric only if
> >> has_event(f'{pmu}/el\-start/')?
> >
> > It's a sysfs event, so this wouldn't work :-(
>
> The below is the definition of el-start in the kernel.
> EVENT_ATTR_STR(el-start, el_start, "event=0xc8,umask=0x1");
>
> The corresponding event in the event list should be HLE_RETIRED.START
> "EventCode": "0xC8",
> "UMask": "0x01",
> "EventName": "HLE_RETIRED.START",
>
> I think we may check the HLE_RETIRED.START instead. If the
> HLE_RETIRED.START doesn't exist, I don't see a reason why the
> tsx_cycles_per_elision should be supported.
>
> Again, in the virtualization world, it's possible that the
> HLE_RETIRED.START exists in the event list but el_start isn't available
> in the sysfs. I think it has to be specially handle in the test as well.

So we keep the has_event test on the sysfs event to handle the
virtualization and disabled case. We use HLE_RETIRED.START to detect
whether the model supports TSX. Should the event be the sysfs or json
version? i.e.

"MetricExpr": "(cycles\\-t / el\\-start if
has_event(el\\-start) else 0)",

or

"MetricExpr": "(cycles\\-t / HLE_RETIRED.START if
has_event(el\\-start) else 0)",

I think I favor the former for some consistency with the has_event.

Using HLE_RETIRED.START means the set of TSX models goes from:
'alderlake',
'cascadelakex',
'icelake',
'icelakex',
'rocketlake',
'sapphirerapids',
'skylake',
'skylakex',
'tigerlake',

To:
broadwell
broadwellde
broadwellx
cascadelakex
haswell
haswellx
icelake
rocketlake
skylake
skylakex

Using RTM_RETIRED.START it goes to:
broadwell
broadwellde
broadwellx
cascadelakex
emeraldrapids
graniterapids
haswell
haswellx
icelake
icelakex
rocketlake
sapphirerapids
skylake
skylakex
tigerlake

So I'm not sure it is working equivalently to what we have today,
which may be good or bad. Here is what I think the code should look
like:

def Tsx() -> Optional[MetricGroup]:
pmu = "cpu_core" if CheckPmu("cpu_core") else "cpu"
cycles = Event('cycles')
cycles_in_tx = Event(f'{pmu}/cycles\-t/')
cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
try:
# Test if the tsx event is present in the json, prefer the
# sysfs version so that we can detect its presence at runtime.
transaction_start = Event("RTM_RETIRED.START")
transaction_start = Event(f'{pmu}/tx\-start/')
except:
return None

elision_start = None
try:
# Elision start isn't supported by all models, but we'll not
# generate the tsx_cycles_per_elision metric in that
# case. Again, prefer the sysfs encoding of the event.
elision_start = Event("HLE_RETIRED.START")
elision_start = Event(f'{pmu}/el\-start/')
except:
pass

return MetricGroup('transaction', [
Metric('tsx_transactional_cycles',
'Percentage of cycles within a transaction region.',
Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
'100%'),
Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted
transactions.',
Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
has_event(cycles_in_tx),
0),
'100%'),
Metric('tsx_cycles_per_transaction',
'Number of cycles within a transaction divided by the
number of transactions.',
Select(cycles_in_tx / transaction_start,
has_event(cycles_in_tx),
0),
"cycles / transaction"),
Metric('tsx_cycles_per_elision',
'Number of cycles within a transaction divided by the
number of elisions.',
Select(cycles_in_tx / elision_start,
has_event(elision_start),
0),
"cycles / elision") if elision_start else None,
], description="Breakdown of transactional memory statistics")

Wdyt?

Thanks,
Ian

> Thanks,
> Kan
>
> >
> > Thanks,
> > Ian
> >
> >> Thanks,
> >> Kan
> >>
> >>> + elision_start = Event(f'{pmu}/el\-start/')
> >>> + metrics += [
> >>> + Metric('tsx_cycles_per_elision',
> >>> + 'Number of cycles within a transaction divided by the number of elisions.',
> >>> + Select(cycles_in_tx / elision_start,
> >>> + has_event(elision_start),
> >>> + 0),
> >>> + "cycles / elision"),
> >>> + ]
> >>> + return MetricGroup('transaction', metrics)
> >>> +
> >>> +
> >>> all_metrics = MetricGroup("", [
> >>> Idle(),
> >>> Rapl(),
> >>> Smi(),
> >>> + Tsx(),
> >>> ])
> >>>
> >>> if args.metricgroups:
> >