Re: [PATCH v1 01/40] perf stat: Introduce skippable evsels

From: Ian Rogers
Date: Thu Apr 27 2023 - 17:09:31 EST


On Thu, Apr 27, 2023 at 2:00 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Thu, Apr 27, 2023 at 1:21 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 27, 2023 at 11:54 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > >
> > > On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> > > > Perf stat with no arguments will use default events and metrics. These
> > > > events may fail to open even with kernel and hypervisor disabled. When
> > > > these fail then the permissions error appears even though they were
> > > > implicitly selected. This is particularly a problem with the automatic
> > > > selection of the TopdownL1 metric group on certain architectures like
> > > > Skylake:
> > > >
> > > > ```
> > > > $ perf stat true
> > > > Error:
> > > > Access to performance monitoring and observability operations is limited.
> > > > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> > > > access to performance monitoring and observability operations for processes
> > > > without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> > > > More information can be found at 'Perf events and tool security' document:
> > > > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> > > > perf_event_paranoid setting is 2:
> > > > -1: Allow use of (almost) all events by all users
> > > > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> > > >> = 0: Disallow raw and ftrace function tracepoint access
> > > >> = 1: Disallow CPU event access
> > > >> = 2: Disallow kernel profiling
> > > > To make the adjusted perf_event_paranoid setting permanent preserve it
> > > > in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> > > > ```
> > > >
> > > > This patch adds skippable evsels that when they fail to open won't
> > > > fail and won't appear in output. The TopdownL1 events, from the metric
> > > > group, are marked as skippable. This turns the failure above to:
> > > >
> > > > ```
> > > > $ perf stat true
> > > >
> > > > Performance counter stats for 'true':
> > > >
> > > > 1.26 msec task-clock:u # 0.328 CPUs utilized
> > > > 0 context-switches:u # 0.000 /sec
> > > > 0 cpu-migrations:u # 0.000 /sec
> > > > 49 page-faults:u # 38.930 K/sec
> > > > 176,449 cycles:u # 0.140 GHz (48.99%)
> > > > 122,905 instructions:u # 0.70 insn per cycle
> > > > 28,264 branches:u # 22.456 M/sec
> > > > 2,405 branch-misses:u # 8.51% of all branches
> > > >
> > > > 0.003834565 seconds time elapsed
> > > >
> > > > 0.000000000 seconds user
> > > > 0.004130000 seconds sys
> > > > ```
> > >
> > > If the same command runs with root permission, a different output will
> > > be displayed as below:
> > >
> > > $ sudo ./perf stat sleep 1
> > >
> > > Performance counter stats for 'sleep 1':
> > >
> > > 0.97 msec task-clock # 0.001 CPUs
> > > utilized
> > > 1 context-switches # 1.030 K/sec
> > > 0 cpu-migrations # 0.000 /sec
> > > 67 page-faults # 69.043 K/sec
> > > 1,135,552 cycles # 1.170 GHz
> > > (50.51%)
> > > 1,126,446 instructions # 0.99 insn
> > > per cycle
> > > 252,904 branches # 260.615 M/sec
> > > 7,297 branch-misses # 2.89% of
> > > all branches
> > > 22,518 CPU_CLK_UNHALTED.REF_XCLK # 23.205
> > > M/sec
> > > 56,994 INT_MISC.RECOVERY_CYCLES_ANY # 58.732 M/sec
> > >
> > > The last two events are useless.
> >
> > You missed the system wide (-a) flag.
> >
> > Thanks,
> > Ian
> >
> > > It's not reliable to rely on perf_event_open()/kernel to tell whether
> > > an event is available or skippable. Kernel wouldn't check a specific event.
> > >
> > > The patch works for the non-root mode is just because the event requires
> > > root permission. It's rejected by the kernel because of lacking
> > > permission. But if the same command runs with root privileges, the trash
> > > events are printed as above.
> > >
> > > I think a better way is to check the HW capability and decided whether
> > > to append the TopdownL1 metrics.
> > >
> > > https://lore.kernel.org/lkml/20230427182906.3411695-1-kan.liang@xxxxxxxxxxxxxxx/
>
> Maybe we can also check if the event is actually enabled like
> checking the enabled_time. Then skip the skippable and not
> enabled ones.

Good idea, and I think that addresses Kan's concern over missing
output. I'll add it in v2.

Thanks,
Ian

> Thanks,
> Namhyung