Re: [PATCH v2] perf stat: Introduce skippable evsels

From: Ian Rogers
Date: Wed Apr 19 2023 - 20:24:23 EST


On Wed, Apr 19, 2023 at 11:57 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
>
> On 2023-04-19 12:51 p.m., Ian Rogers wrote:
> >>>>> With that change I don't have a case that requires skippable evsels,
> >>>>> and so we can take that series with patch 6 over the v1 of that series
> >>>>> with this change.
> >>>>>
> >>>>
> >>>> I'm afraid this is not the only problem the commit 94b1a603fca7 ("perf
> >>>> stat: Add TopdownL1 metric as a default if present") in the
> >>>> perf-tools-next branch introduced.
> >>>>
> >>>> The topdown L2 in the perf stat default on SPR and big core of the ADL
> >>>> is still missed. I don't see a possible fix for this on the current
> >>>> perf-tools-next branch.
> >>>
> >>> I thought in its current state the json metrics for TopdownL2 on SPR
> >>> have multiplexing. Given L1 is used to drill down to L2, it seems odd
> >>> to start on L2, but given L1 is used to compute the thresholds for L2,
> >>> this should be to have both L1 and L2 on these platforms. However,
> >>> that doesn't work as you don't want multiplexing.
> >>>
> >>> This all seems backward to avoid potential multiplexing on branch miss
> >>> rate and IPC, just always having TopdownL1 seems cleanest with the
> >>> skippable evsels working around the permissions issue - as put forward
> >>> in this patch. Possibly adding L2 metrics on ADL/SPR, but only once
> >>> the multiplexing issue is resolved.
> >>>
> >>
> >> No, not just that issue. Based to what I tested these days, perf stat
> >> default has issues/regressions on most of the Intel platforms with the
> >> current perf-tools-next and perf/core branch of acme's repo.
> >>
> >> For the pre-ICL platforms:
> >> - The permission issue. (This patch tried to address.)
> >> - Unclean perf stat default. (This patch failed to address.)
> >> Unnecessary multiplexing for cycles.
> >> Display partial of the TopdownL1
> >>
> >> https://lore.kernel.org/lkml/d1fe801a-22d0-1f9b-b127-227b21635bd5@xxxxxxxxxxxxxxx/
> >>
> >> For SPR platforms
> >> - Topdown L2 metrics is missed, while it works with the current 6.3-rc7.
> >>
> >> For ADL/RPL platforms
> >> - Segmentation fault which I just found this morning.
> >> # ./perf stat true
> >> Segmentation fault (core dumped)
> >
> > This may also stem from the reference count checking work that Arnaldo
> > is currently merging. It is hard to test hybrid because it uses
> > non-generic code paths.
>
> There are two places which causes the Segmentation fault.
> One place is the TopdownL1.
>
> After I disable the TopdownL1 and add !counter->name as below, there are
> no errors for the ./perf stat true.
>
> (The below is just for test purpose.)
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 7a641a67486d..8e12ed1141e0 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1892,7 +1892,7 @@ static int add_default_attributes(void)
> * Add TopdownL1 metrics if they exist. To minimize
> * multiplexing, don't request threshold computation.
> */
> - if (metricgroup__has_metric("TopdownL1")) {
> + if (0 && metricgroup__has_metric("TopdownL1")) {

So hybrid has something different that causes this. Can you provide
the information to solve?

> struct evlist *metric_evlist = evlist__new();
> struct evsel *metric_evsel;
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 6b46bbb3d322..072fa56744b4 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -747,7 +747,7 @@ static void uniquify_event_name(struct evsel *counter)
> int ret = 0;
>
> if (counter->uniquified_name || counter->use_config_name ||
> - !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
> + !counter->pmu_name || !counter->name ||
> !strncmp(counter->name, counter->pmu_name,
> strlen(counter->pmu_name)))
> return;

Is this a pre-existing hybrid bug? It is a real shame hybrid shows so
few common code paths. In general evsel__name should be preferred over
directly accessing name.

> >
> >> After the test on a hybrid machine, I incline to revert the commit
> >> 94b1a603fca7 ("perf stat: Add TopdownL1 metric as a default if present")
> >> and related patches for now.
> >>
> >> To clarify, I do not object a generic solution for the Topdown on
> >> different ARCHs. But the current generic solution aka TopdownL1 has all
> >> kinds of problems on most of Intel platforms. We should fix them first
> >> before applying to the mainline.
> >
> > No, 6.3 has many issues as do the default events/metrics:
>
> To be honest, I don't think they are real critical issues. For the first
> one, I think there was already a temporary fix. For the others, they are
> there for years.

This isn't true. The aggregation bug was raised to me by Intel and
stems from aggregation refactoring in per-thread mode done by
Namhyung.

> However, the solution you proposed in the huge patch set
> (https://lore.kernel.org/lkml/20230219092848.639226-37-irogers@xxxxxxxxxx/)
> brings many critical issues on different Intel platforms, crashes,
> Missing features, etc.
> Also, I was just told that many of our existing tools which on top of
> the perf tool will also be broken, because all the annotations of the
> kernel top-down metrics event disappeared.
>
> So we really should revert the patches. I don't think patches 39 to 51
> are well-tested and reviewed.

The only issue I'm aware of is that hard coded use of the inaccurate
hard coded metrics now needs to switch to json metrics. This seems a
worthwhile update anyway, and not one that would justify breaking perf
stat metrics.

Thanks,
Ian

> Thanks,
> Kan
>
> > - in 6.3 aggregation is wrong for metrics, specifically double
> > counting happens as the summary gets applied to the saved value under
> > certain conditions like repeat runs. This is solved in perf-tools-next
> > by removing the duplicated aggregation and using a single set of
> > counters. The "shadow" code is both simpler and more correct than in
> > 6.3, reverting it would be a massive regression - tbh, if it wasn't
> > for the size of the patches I think we should cherry-pick what is in
> > perf-tools-next back to 6.3 due to the size of the errors.
> > - the default events for IPC and branch miss rate lack groups and so
> > are inaccurate when multiplexing happens - but multiplexing inaccuracy
> > has historically not been an issue for the default events/metrics.
> > - the previous default topdown metrics were inaccurate compared to
> > the TMA metrics spreadsheet, specifically they lacked events to
> > correct for errors and the thresholds differed. This is why TopdownL2
> > is a challenge because the json metrics do things properly causing the
> > event groups not to be trivially shared.
> > - the topdown event solution is not generic, it is not only Intel
> > specific but Intel model specific. Solving the problem via the json is
> > the most consistent way to do this and is what I suggest we move to
> > in:
> > https://lore.kernel.org/lkml/20230419005423.343862-6-irogers@xxxxxxxxxx/
> > but you can keep pushing that perf should do something special for
> > Intel and for every Intel model. It also pushes a need into the metric
> > code to do hybrid specific things so we can ignore atom TopdownL1. In
> > this vein, hybrid is a giant mess that the code base needs cleaning up
> > from; take the recent patch to just disable generic event parsing
> > tests for MTL:
> > https://lore.kernel.org/lkml/20230411094330.653965-1-tinghao.zhang@xxxxxxxxx/
> > We need to stop doing this, and have generic code, not least because
> > of complaints like:
> > https://www.kernel.org/doc/html/latest/gpu/i915.html#issues-hit-with-first-prototype-based-on-core-perf
> > and because of the crash you are reporting on ADL/RPL.
> >
> > What has motivated you here is a concern over multiplexing on branch
> > miss rate and IPC on pre-Icelake architectures that I find hard to
> > understand and is trivially remedied in time honored fashions, ie
> > profile the metrics you care about separately. A far more egregious
> > multiplexing problem is that on Icelake+ architectures we multiplex
> > fixed and generic counters unnecessarily when the same fixed counter
> > is needed or there are insufficient generic counters. This is
> > egregious as there is no user workaround and the kernel fix, to
> > iterate all hardware events the same way we do for all software
> > events, is both obvious and trivial. We've lived with this
> > multiplexing problem since Icelake and I think we can live with a
> > possible multiplexing problem, on legacy architectures, with what is
> > in perf-tools-next. The permissions problem tackled here means we
> > should merge this.
> >
> > Thanks,
> > Ian
> >
> >> Thanks,
> >> Kan