Re: [PATCH v4 3/4] perf/core: Remove pmu linear searching code

From: Thomas Richter
Date: Thu Jun 01 2023 - 07:06:17 EST


On 5/31/23 22:20, Ian Rogers wrote:
> On Wed, May 31, 2023 at 2:09 AM Thomas Richter <tmricht@xxxxxxxxxxxxx> wrote:
>>
>> On 5/30/23 16:00, Ian Rogers wrote:
>>> ls /sys/devices/*/cpu*
>>
>> Hi Ian,
>>
>> here is the output yo requested:
>>
>> # ls /sys/devices/*/cpu*
>> cpu0 cpu3 cpu6 hotplug modalias possible smt
>> cpu1 cpu4 cpu7 isolated offline present uevent
>> cpu2 cpu5 dispatching kernel_max online rescan vulnerabilities
>> #
>>
>> In fact it is the same as
>> # ls /sys/devices/system/cpu/
>> cpu0 cpu3 cpu6 hotplug modalias possible smt
>> cpu1 cpu4 cpu7 isolated offline present uevent
>> cpu2 cpu5 dispatching kernel_max online rescan vulnerabilities
>> #
>> This directory tree has nothing to do with events for perf, it is
>> merely used to support CPU hotplug on s390.
>>
>> The PMUs on s390 are
>> # ls -ld /sys/devices/{cpum_,pai_}*
>> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/cpum_cf
>> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/cpum_cf_diag
>> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/cpum_sf
>> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/pai_crypto
>> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/pai_ext
>> #
>>
>> I hope his helps.
>
> Thanks Thomas,
>
> The perf tool has a notion of "core" and "other" PMUs - other means
> uncore and things like interconnect PMUs. The distinction matters as
> PMUs may have a list of CPUs with them, for "other" PMUs the CPU list
> (known in the tool as the CPU map) is a suggestion of the CPU to open
> events on. For example, on my laptop:
> ```
> $ cat /sys/devices/system/cpu/online
> 0-15
> $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> 0
> ```
> So I have 16 "CPUs" and the memory controller is suggesting opening
> the events on CPU 0. However, if I do:
> ```
> $ sudo perf stat -e 'uncore_imc_free_running_0/data_read/' -C 8 -a -A sleep 1
>
> Performance counter stats for 'system wide':
>
> CPU8 4,617.60 MiB
> uncore_imc_free_running_0/data_read/
>
>
> 1.001094684 seconds time elapsed
> ```
> Then things are good and the event was recorded on CPU 8, even though
> the cpumask of the PMU only said CPU 0.
>
> For "core" PMUs the CPU map works differently. A CPU outside of the
> map is an error. For example, if I have a heterogeneous system with 2
> CPUs, the first CPU on 1 PMU and the second CPU on a different PMU, if
> I try to open events for the wrong PMU type for the CPU then it
> should fail. The CPU map should be interpreted as the set of CPUs
> events are valid on.
>
> The logic to determine if a PMU is "core" or "other" is here:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1417
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n609
>
> That is, a PMU is a "core" PMU if it is called "cpu" or if there is a
> file called "cpus" inside the PMU's sysfs directory. It seems on s390
> none of the PMUs are considered "core" and this is why we're having
> issues.
>
> Looking at:
> https://lore.kernel.org/lkml/20180416132314.33249-1-tmricht@xxxxxxxxxxxxx/
>
> It would seem to make sense that "cpu", "cpum_cf" and "cpum_sf" should
> all cause "is_pmu_core" to return true, From your output I suspect the
> issue is that cpum_cf and cpum_sf both lack the "cpus" file - which is
> also true on homogeneous x86's "cpu" PMU. We can fix the code with:
>
> ```
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 0520aa9fe991..bdc3f3b148fc 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1414,7 +1414,9 @@ void perf_pmu__del_formats(struct list_head *formats)
>
> bool is_pmu_core(const char *name)
> {
> - return !strcmp(name, "cpu") || is_sysfs_pmu_core(name);
> + return !strcmp(name, "cpu") ||
> + !strcmp(name, "cpum_cf") || !strcmp(name, "cpum_sf") ||
> + is_sysfs_pmu_core(name);
> }
>
> bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu)
> ```
>
> Wdyt? Thanks,
> Ian
>

Ian, thanks very much for your help. I changed the code as you
suggested and added s390's PMU cpum_cf as in

bool is_pmu_core(const char *name)
{
return !strcmp(name, "cpu") || !strcmp(name, "cpum_cf") ||
is_sysfs_pmu_core(name);
}

The result is much better, now the
- test case 6.1 fails only 2 times and for a different reason:
running test 2 'r1a'
FAILED tests/parse-events.c:125 Raw PMU not matched
Event test failure: test 2 'r1a' rc -1
...
running test 14 'r1a:kp'
FAILED tests/parse-events.c:125 Raw PMU not matched
Event test failure: test 14 'r1a:kp' rc -1

- test case 10.2 fails for an unknown event bp_l1_btb_correct.
10.2: PMU event map aliases :
--- start ---
Using CPUID IBM,8561,703,T01,3.6,002f
testing aliases core PMU cpum_cf: no alias, alias_table->name=bp_l1_btb_correct
testing core PMU cpum_cf aliases: failed
---- end ----
PMU events subtest 2: FAILED!
- test case 68 now succeeds:
# ./perf test 68
68: Parse and process metrics : Ok
#

Regarding test case 6.1:
Raw PMU not matched is printed in tests/parse-events.c::125
Debugging revealed that function
test_event()
+--> parse_events() which returns 0
+--> e->check() ---> test__checkevent_raw()
This function has TEST_ASSERT_VAL("Raw PMU not matched", raw_type_match);
triggering -1 because the PMU has type 8 which is not PERF_TYPE_RAW

Maybe make type >= PERF_TYPE_RAW to succeed the test?

Regarding test case 10.2: the event bp_l1_btb_correct does not exist on
s390. It is defined in
- pmu-events/empty-pmu-events.c
- pmu-events/arch/test/test_soc/cpu/branch.json
Not sure which one is used.
But this event is unknown on s390

What is the best way to fix these issues.

PS: I have the feeling, it gets complicated to have multiple hardware PMUs
per platform.

--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294