Re: [PATCH v3 27/46] perf print-events: Print legacy cache events for each PMU

From: Ian Rogers
Date: Tue May 02 2023 - 13:40:55 EST


On Tue, May 2, 2023 at 3:48 AM Ravi Bangoria <ravi.bangoria@xxxxxxx> wrote:
>
> On 29-Apr-23 11:04 AM, Ian Rogers wrote:
> > Mirroring parse_events_add_cache, list the legacy name alongside its
> > alias with the PMU. Remove the now unnecessary hybrid logic.
>
> Before patch:
>
> ```
> $ sudo ./perf list
> ...
> duration_time [Tool event]
> user_time [Tool event]
> system_time [Tool event]
> L1-dcache-loads [Hardware cache event]
> L1-dcache-load-misses [Hardware cache event]
> L1-dcache-prefetches [Hardware cache event]
> L1-icache-loads [Hardware cache event]
> L1-icache-load-misses [Hardware cache event]
> dTLB-loads [Hardware cache event]
> dTLB-load-misses [Hardware cache event]
> iTLB-loads [Hardware cache event]
> iTLB-load-misses [Hardware cache event]
> branch-loads [Hardware cache event]
> branch-load-misses [Hardware cache event]
> branch-brs OR cpu/branch-brs/ [Kernel PMU event]
> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
> ...
> ```
>
> After patch:
>
> ```
> $ sudo ./perf list
> ...
> duration_time [Tool event]
> user_time [Tool event]
> system_time [Tool event]
>
> cpu:
> L1-dcache-loads OR cpu/L1-dcache-loads/
> L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
> L1-dcache-prefetches OR cpu/L1-dcache-prefetches/
> L1-icache-loads OR cpu/L1-icache-loads/
> L1-icache-load-misses OR cpu/L1-icache-load-misses/
> dTLB-loads OR cpu/dTLB-loads/
> dTLB-load-misses OR cpu/dTLB-load-misses/
> iTLB-loads OR cpu/iTLB-loads/
> iTLB-load-misses OR cpu/iTLB-load-misses/
> branch-loads OR cpu/branch-loads/
> branch-load-misses OR cpu/branch-load-misses/
> branch-brs OR cpu/branch-brs/ [Kernel PMU event]
> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
> ...
> ```
>\
> Is this intentional change?

Yep, but I think the commit message should call it out, so I'll change
it in v4. When we have an alias the event type descriptor isn't shown,
this is pre-existing perf list behavior but I think we may want to
tweak it as I like the event type descriptor.

> > - for (int type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
> > - for (int op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
> > - /* skip invalid cache type */
> > - if (!evsel__is_cache_op_valid(type, op))
> > - continue;
> > + while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> > + /*
> > + * Skip uncore PMUs for performance. Software PMUs can open
> > + * PERF_TYPE_HW_CACHE, so skip.
>
> This statement is bit confusing. Can you please explain how SW pmus can
> open cache events.

If the type is PERF_TYPE_HW_CACHE (3) and the extended type is
PERF_TYPE_SOFTWARE (1) then this yields the encoding '3:0x100000000'
which will succeed perf_event_open:
```
$ perf stat -vv -e '3:0x100000000' true
Using CPUID GenuineIntel-6-8D-1
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
type 3
size 128
config 0x100000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 2128955 cpu -1 group_fd -1 flags 0x8 = 3
3:0x100000000: -1: 265630 261923 261923
3:0x100000000: 265630 261923 261923

Performance counter stats for 'true':

265,630 3:0x100000000

0.000844251 seconds time elapsed

0.000911000 seconds user
0.000000000 seconds sys
```
I agree this isn't expected, but if I don't exclude the PMU type the
print events will list it as an alias. I'll try to improve the comment
in v4.

Thanks,
Ian