Re: [RFC PATCH v1] perf parse-events: Make legacy events lower priority than sysfs/json

From: Ian Rogers
Date: Thu Nov 23 2023 - 13:00:20 EST


On Thu, Nov 23, 2023 at 8:09 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
>
> On Thu, 23 Nov 2023 15:27:54 +0000,
> Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Thu, Nov 23, 2023 at 7:16 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > >
> > > Again, perf gets shipped in distros, and not necessary as the latest
> > > version. Rather, they tend to ship the version matching the kernel. No
> > > backport, buggy perf.
> >
> > Please complain to the distros. I complained to Debian, we got rid of
> > the horrible wrapper script thing they did. I complained to two
> > separate Ubuntu people over the last two weeks as they still have
> > broken packaging even though they derive from Debian. Fedora is of
> > course perfect as Arnaldo oversees it :-)
>
> In this instance, I don't need to complain to anyone but you. And
> guess what: it is on Fedora that this issue was first discovered.
>
> I also don't see what distro packaging policy has anything to do with
> the issue at hand, but that's beside the point.

Because the latest perf tool is always improved and carries fixes,
just as say gcc or clang. We don't ask these tools to backport fixes
and then deliberately run out-of-date versions of them.

> >
> > > And again, I don't see a bug in the PMU driver.
> >
> > Whether the PMU driver is requested a legacy cycles event or the
> > cycles event as an event code, the PMU driver should support it.
> > Supporting legacy events is just something core PMU drivers do. This
> > workaround wouldn't be necessary were it not for this PMU bug.
>
> Again, *which* PMU bug? What is a legacy event, and when has this
> terminology made it into the kernel? Who has decided that a change was
> necessary? Why haven't you submitted patches upgrading all the PMU
> drivers to support whatever you are referring to?

I did fix ARM's PMU driver for extended types, James Clark took over
the patch. The term legacy has at least been in use in kernel source
code for over 11 years:
http://lkml.kernel.org/r/1337584373-2741-4-git-send-email-jolsa@xxxxxxxxxx

An issue I face in fixing somebody's PMU driver is it is ever so
useful to be able to test. The work done with James was done blind by
me except for checking for regressions on a raspberry pi 4, which
isn't heterogeneous (nor is the 5 *sigh*). The fact there were bugs in
ARM's PMU driver for so long shows a lack of testing by ARM and we've
been going out of our way to increase testing. Something positive ARM
could do in this area is to update the parse-events test, yes the one
that is supposed to test issues like this, so that the hardcoded "cpu"
PMU assumption that works on most platforms who name their core PMU
"cpu" also works on ARM. For bonus points setting up testing so that
we know when things break would be useful. As mentioned in previous
emails I hope to work away from needing an actual machine to test the
perf tool's correctness, but we're a long way from that. There are
very many BIG.little Android devices in the field where the PMUs are
not set up as heterogeneous, ARM could contribute a CTS test to
Android to make sure this doesn't happen.

Thanks,
Ian

> > This change impacts every user of perf not just a partial fix to
> > workaround ARM PMU driver issues, see the updated parse-events test
> > for a list of what a simple test sees as a behavior change.
>
> When making far-reaching changes to a subsystem, I apply two rules:
>
> - I address everything that is affected, not just my pet architecture
>
> - I don't break other people's toys, which means compatibility is a
> *must*, not a 'nice to have'
>
> By this standard, your complaining that "ARM is broken" doesn't hold.
> It was working just fine until your changes rendered perf unusable.
>
> Nonetheless, thank you for addressing it quickly. This is sincerely
> appreciated.
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.