Re: [PATCH V2] perf top: Use evsel's cpus to replace user_requested_cpus

From: Liang, Kan
Date: Wed Dec 13 2023 - 16:22:12 EST




On 2023-12-13 12:42 p.m., Ian Rogers wrote:
> On Wed, Dec 13, 2023 at 7:45 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>
>>
>> On 2023-12-12 8:06 p.m., Namhyung Kim wrote:
>>> On Tue, Dec 12, 2023 at 2:12 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>>>> On Tue, Dec 12, 2023 at 1:25 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> On 2023-12-12 3:37 p.m., Ian Rogers wrote:
>>>>>> On Tue, Dec 12, 2023 at 11:39 AM <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>>>>>> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>>>>>>>
>>>>>>> perf top errors out on a hybrid machine
>>>>>>> $perf top
>>>>>>>
>>>>>>> Error:
>>>>>>> The cycles:P event is not supported.
>>>>>>>
>>>>>>> The perf top expects that the "cycles" is collected on all CPUs in the
>>>>>>> system. But for hybrid there is no single "cycles" event which can cover
>>>>>>> all CPUs. Perf has to split it into two cycles events, e.g.,
>>>>>>> cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
>>>>>>> If a event is opened on the unsupported CPU. The open fails. That's the
>>>>>>> reason of the above error out.
>>>>>>>
>>>>>>> Perf should only open the cycles event on the corresponding CPU. The
>>>>>>> commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
>>>>>>> core PMU maps") intersect the requested CPU map with the CPU map of the
>>>>>>> PMU. Use the evsel's cpus to replace user_requested_cpus.
>>>>>>>
>>>>>>> The evlist's threads are also propagated to the evsel's threads in
>>>>>>> __perf_evlist__propagate_maps(). For a system-wide event, perf appends
>>>>>>> a dummy event and assign it to the evsel's threads. For a per-thread
>>>>>>> event, the evlist's thread_map is assigned to the evsel's threads. The
>>>>>>> same as the other tools, e.g., perf record, using the evsel's threads
>>>>>>> when opening an event.
>>>>>>>
>>>>>>> Reported-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
>>>>>>> Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@xxxxxxxxxx/
>>>>>>> Reviewed-by: Ian Rogers <irogers@xxxxxxxxxx>
>>>>>>> Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>>>>>>> ---
>>>>>>>
>>>>>>> Changes since V1:
>>>>>>> - Update the description
>>>>>>> - Add Reviewed-by from Ian
>>>>>> Thanks Kan, quick question. Does "perf top" on hybrid ask the user to
>>>>>> select between the cycles event on cpu_atom and cpu_core?
>>>>> Yes, but the event doesn't include the PMU information.
>>>>> We probably need a follow up patch to append the PMU name.
>>>>>
>>>>> Available samples
>>>>> 385 cycles:P
>>>>>
>>>>> 903 cycles:P
>>>> Thanks and agreed, it isn't possible to tell which is which PMU/CPU
>>>> type at the moment. I tried the patch with perf top --stdio, there
>>>> wasn't a choice of event
>> The perf top --stdio uses a dedicated display function, see
>> perf_top__header_snprintf() in util/top.c
>>
>> It only shows one event at a time. "E" is used to switch the event.
>>
>>>> and I can't tell what counter is being
>>>> displayed.
>> For the hybrid case, I think we may display both PMU name and event
>> name. For example,
>>
>> Available samples
>> 656 cycles:P cpu_atom
>>
>> 701 cycles:P cpu_core
> I think there would be more uniformity if we could do:
> cpu/cycles/P
> I'm just reminded of the stat output where sometimes you can get things like:
> cpu/cycles/
> or sometimes:
> cycles [cpu]
> It seems the slash style approach is the most uniform given it looks
> like what is written on the command line. Perhaps we need some kind of
> helper function to do this as reformatting the modifier is a pain.

Actually, we already have a helper in the perf record,
record__uniquify_name().
I can move it to the util/record.c and let's perf top invoke it before
the open as well. Then we can get this in the perf top.

Available samples
148 cpu_atom/cycles:P/

1K cpu_core/cycles:P/

It should be good enough to distinguish the events.
I will send a patch to support it.

Thanks,
Kan