Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

From: Ian Rogers
Date: Wed Dec 13 2023 - 12:33:43 EST


On Wed, Dec 13, 2023 at 5:33 AM Leo Yan <leo.yan@xxxxxxxxxx> wrote:
>
> On Mon, Dec 11, 2023 at 01:39:36PM -0500, Liang, Kan wrote:
> > On 2023-12-09 12:48 a.m., Leo Yan wrote:
> > > On Thu, Dec 07, 2023 at 11:23:36AM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
> > >> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> > >>
> > >> Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
> > >> one.
> > >
> > > Now memory event naming is arch dependent, this is because every arch
> > > has different memory PMU names, and it appends specific configurations
> > > (e.g. aarch64 appends 'ts_enable=1,...,min_latency=%u', and x86_64
> > > appends 'ldlat=%u', etc). This is not friendly for extension, e.g. it's
> > > impossible for users to specify any extra attributes for memory events.
> > >
> > > This patch tries to consolidate in a central place for generating memory
> > > event names, its approach is to add more flags to meet special usage
> > > cases, which means the code gets more complex (and more difficult for
> > > later's maintenance).
> > >
> > > I think we need to distinguish the event naming into two levels: in
> > > util/mem-events.c, we can consider it as a common layer, and we maintain
> > > common options in it, e.g. 'latency', 'all-user', 'all-kernel',
> > > 'timestamp', 'physical_address', etc. In every arch's mem-events.c
> > > file, it converts the common options to arch specific configurations.
> > >
> > > In the end, this can avoid to add more and more flags into the
> > > structure perf_mem_event.
> >
> > The purpose of this cleanup patch set is to remove the unnecessary
> > __weak functions, and try to make the code more generic.
>
> I understand weak functions are not very good practice. The point is
> weak functions are used for some archs have implemented a feature but
> other archs have not.
>
> I can think a potential case to use a central place to maintain the
> code for all archs - when we want to support cross analysis. But this
> patch series is for supporting cross analysis, to be honest, I still
> don't see benefit for this series, though I know you might try to
> support hybrid modes.

So thanks to Kan for doing this series and trying to clean the code
up. My complaint about the code is that it was overly hard wired.
Heck, we have event strings to parse that hard code PMU and event
names. In fixing hybrid my complaint was that we were adding yet more
complexity and as a lot of this was resting on printf format strings
it was hard to check that these were being used correctly. The
direction of this patch series I agree with.

Imo we should avoid weak definitions. Weak definitions are outside of
the C specification and have introduced bugs into the code base -
specifically a weak const array was having its size propagated into
code but then the linker later replaced that weak array. An
architecture #ifdef is better than a weak definition as the behavior
is defined and things blow up early rather than suffering from subtle
corruptions.

The Linux kernel device driver is abstracting what the hardware is
doing and since the more recent changes the PMU abstraction in the
perf tool is trying to match that. If we need to interface with PMUs
differently on each architecture then something is wrong. It happens
that things are wrong and we need to work with that. For example, on
intel there are topdown events that introduce ordering issues. We have
default initialization functions for different PMUs. The perf_pmu
struct is created in perf_pmu__lookup and that always calls an
perf_pmu__arch_init where the name of the PMU is already set. In the
weak perf_pmu__arch_init we tweak the perf_pmu struct so that it will
behave correctly elsewhere in the code. These changes are paralleling
that. That said, the Intel perf_pmu__arch_init does strcmps against
"intel_pt" and "intel_bts", does it really need to be arch specific
given it is already PMU specific? A situation we see in testing is
people trying to run user space ISA emulators like qemu, say ARM on
x86, should the PMU set up for intel_pt and intel_bts be broken in
this environment? I think that as the names are very specific I'd
prefer if the code were outside of the tools/perf/arch directory.
There are cases with PMU names like "cpu" where we're probably going
to need ifdefs or runtime is_intel() calls.

Anyway, my point is that I think we should be moving away from arch
specific stuff, as this patch series tries, and that way we have the
best chance of changes benefitting users regardless of hardware. It
may be that to make all of this work properly we need to modify PMUs,
but that all seems good, such as adding the extended type support for
legacy events on ARM PMUs so that legacy events can work without a
fixed CPU. We haven't got core PMUs working properly, see the recent
perf top hybrid problem. There are obviously issues with uncore as,
for example, most memory controllers are replicated PMUs but some
kernel perf drivers select a memory controller via a config value. We
either need to change the drivers for some kind of uniformity or do
some kind of abstracting in the perf tool. I think we'll probably need
to do it in the tool and when doing that we really shouldn't be doing
it in an arch specific or weak function way.

Thanks,
Ian

> > The two flags has already covered all the current usage.
> > For now, it's good enough.
> >
> > I agree that if there are more flags, it should not be a perfect
> > solution. But we don't have the third flag right now. Could we clean up
> > it later e.g., when introducing the third flag?
> >
> > I just don't want to make the patch bigger and bigger. Also, I don't
> > think we usually implement something just for future possibilities.
>
> Fine for me, but please address two main concerns in next spin:
>
> - Fix building failure in patch 01;
> - Fix the concerned regression on Arm64/AMD archs in patch 04. I will
> give a bit more explanation in another reply.
>
>
> [...]
>
> > >> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> > >> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
> > >>
> > >> struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
> > >> - E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
> > >> - E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
> > >> - E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
> > >> + E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
> > >> + E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
> > >> + E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
> > >
> > > Arm SPE is AUX event, should set '1' to the field '.aux_event'.
> >
> > It actually means whether an extra event is required with a mem_loads
> > event. I guess for ARM the answer is no.
>
> Thanks for confirmation.
>
> > > I am a bit suspect if we really need the field '.aux_event', the
> > > '.aux_event' field is only used for generating event string.
> >
> > No, it stores the event encoding for the extra event.
> > ARM doesn't need it, so it's 0.
>
> I searched a bit and confirmed '.aux_event' is only used in
> util/mem-events.c and for 'perf record'.
>
> I failed to connect the code with "it stores the event encoding for the
> extra event". Could you elaborate a bit for this?
>
> Thanks,
> Leo