Re: [RFC PATCH v1] perf parse-events: Make legacy events lower priority than sysfs/json

From: Arnaldo Carvalho de Melo
Date: Thu Nov 23 2023 - 16:39:12 EST


Em Thu, Nov 23, 2023 at 02:37:31PM +0000, Mark Rutland escreveu:
> Hi Ian,
>
> Thanks for this!

Yeah, it seems we're making progress, thanks for the continuous effort
in getting this fixed!

> On Wed, Nov 22, 2023 at 08:29:22PM -0800, Ian Rogers wrote:
> > The perf tool has previously made legacy events the priority so with
> > or without a PMU the legacy event would be opened:

<SNIP>

> > The bulk of this change is updating all of the parse-events test
> > expectations so that if a sysfs/json event exists for a PMU the test
> > doesn't fail - a further sign, if it were needed, that the legacy
> > event priority was a known and tested behavior of the perf tool.

> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>

> Regardless of my comments below, for this patch as-is:

> Acked-by: Mark Rutland <mark.rutland@xxxxxxx>

I'm collecting this even with the problems in some setups so far, thanks
for providing it.

> > ---
> > This is a large behavioral change:
> > 1) the scope of the change means it should bake on linux-next and I
> > don't believe should be a 6.7-rc fix.
>
> I'm happy for this to bake, but I do think it needs to be backported for the
> sake of users, especially given that it *restores* the old behaviour.
>
> > 2) a fixes tag and stable backport I don't think are appropriate.

> For the sake of users I think a fixes tag and stable backport are necssary. In
> practice distributions ship the perf tool associated with their stable kernel,
> so (for better or worse) a stable backport is certainly necessary for distros
> that'll use the v6.6 stable kernel.

Which, as Ian mentioned, is a common misconception, as the lack of
lockstep of perf/kernel versions was never properly stated in
documentation, only in the source code, look for the
evsel__disable_missing_features() function that tries to do whatever we
managed to do from what was being asked (new features for old kernels)
and the laconic responses from perf_event_open() given back to those
requests.

But the fact is that most if not all distros think perf is in lockstep
with the kernel, which is not the intent.

That said, for distros that do backports, this is one to be done, and
for stable@xxxxxxxxxx, yeah, I also think this is one to be flagged as
that, but since this hybrid thing has such a miscoordinated
user/kernel/arches history, with such great number of nuances and
interpretations, I think we better continue to test it for a while, in
perf-tools-next/perf-tools-next and linux-next, to the flag it for
backports.

> > The real reported issue is with the PMU driver.
>
> Having trawled through the driver and core perf code, I don't believe the PMU
> driver is at fault. Please see my analysis at:
>
> https://lore.kernel.org/lkml/ZV9gThJ52slPHqlV@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> ... where it looks like the perf tool is dropping the extended type ID in some
> cases.

> If you know of a specific bug in the PMU driver or perf core code, please let
> me know and I will investigate. As it stands we have no evidence of a bug in
> the PMU driver, and pretty clear evidence (as linked above) there there is
> *some* issue in userspace. In the absence of such evidence, please don't assert
> that there must be a kernel bug.

> > A backport would bring the
> > risk that later fixes, due to the large behavior change, wouldn't be
> > backported and past releases get regressed in scenarios like
> > hybrid. Backports for the perf tool are also less necessary than say a
> > buggy PMU driver, as distributions should be updating to the latest
> > perf tool regardless of what Linux kernel is being run (the perf tool
> > is backward compatible).

> As above I believe that a backport is necessary.

Agreed, as we get this tested a bit.

- Arnaldo