Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events

From: Florian Fischer
Date: Tue Apr 26 2022 - 05:39:12 EST


On 25.04.2022 12:06, Namhyung Kim wrote:
> On Sat, Apr 23, 2022 at 5:16 AM Florian Fischer
> <florian.fischer@muhq.space> wrote:
> >
> > On 22.04.2022 16:52, Namhyung Kim wrote:
> > > Hello,
> > >
> > > On Fri, Apr 22, 2022 at 3:05 PM Arnaldo Carvalho de Melo
> > > <acme@xxxxxxxxxx> wrote:
> > > >
> > > > Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> > > > > This patch series adds new internal events to perf stat exposing the times spend
> > > > > in user and kernel mode in nanoseconds reported by rusage.
> > > > >
> > > > > During some benchmarking using perf it bothered me that I could not easily
> > > > > retrieve those times from perf stat when using the machine readable output.
> > > > >
> > > > > But perf definitely knows about those values because in the human readable output
> > > > > they are present.
> > > > >
> > > > > Therefore I exposed the times reported by rusage via the new tool events:
> > > > > user_time and system_time.
> > > > >
> > > > > This allows to retrieved them in machine-readable output:
> > > > >
> > > > > $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> > > > > 72134524,ns,duration_time:u,72134524,100.00,,
> > > > > 65225000,ns,user_time:u,65225000,100.00,,
> > > > > 6865000,ns,ssystem_time:u,6865000,100.00,,
>
> > > Anyway it looks a little bit strange to me if we can get
> > > system time in user mode only (the 'u' modifier).
> >
> > Sorry but I don't really understand what you mean.
> > The system_time is reported to userspace via rusage filled by wait4(2).
> > It will always report the value reported to the user space regardless of what
> > counters perf has access to.
> >
> > If you run perf as user you get the same system_time (but with the ':u' suffix)
> > as when you run perf as root or lower kernel.perf_event_paranoid to allow access
> > to more counters.
>
> The ':u' modifier means that the event should count only in user mode.
> So I think system_time:u should be 0 by definition.
> Likewise, user_time:k should be handled as such.

Not sure if this logic applies to tool events because the time spend in user mode
or kernel mdoe stays the same regardless of where you count it.

This is consistent with how duration_time works currently in perf stat.

./perf stat -e duration_time,duration_time:u,duration_time:k -- grep -q -r duration_time .

Performance counter stats for 'grep -q -r duration_time .':

29,012,558 ns duration_time
29,012,558 ns duration_time:u
29,012,558 ns duration_time:k

0.029012558 seconds time elapsed

0.029088000 seconds user
0.000000000 seconds sys

>
> But as I said before, we already have the task-clock event, so it's not
> clear why we need this too. Also these tool events won't work for
> other use cases like perf record.

Maybe I do not know enough about perf but I think this is not exactly how those
suffixes work. This is the output of perf of next-20220420 and my rusage tool
event patches applied:

$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = -1

linux/tools/perf$ ./perf stat -e task-clock,task-clock:u,task-clock:k,\
duration_time,duration_time:u,duration_time:k,\
system_time,system_time:u,system_time:k,\
user_time,user_time:u,user_time:k -- grep -q -r duration_time .

Performance counter stats for 'grep -q -r duration_time .':

42.30 msec task-clock # 0.483 CPUs utilized
42.30 msec task-clock:u # 0.483 CPUs utilized
42.30 msec task-clock:k # 0.483 CPUs utilized
87,533,340 ns duration_time # 2.069 G/sec
87,533,340 ns duration_time:u # 2.069 G/sec
87,533,340 ns duration_time:k # 2.069 G/sec
12,841,000 ns system_time # 303.547 M/sec
12,841,000 ns system_time:u # 303.547 M/sec
12,841,000 ns system_time:k # 303.547 M/sec
30,820,000 ns user_time # 728.550 M/sec
30,820,000 ns user_time:u # 728.550 M/sec
30,820,000 ns user_time:k # 728.550 M/sec

0.087533340 seconds time elapsed

0.030820000 seconds user
0.012841000 seconds sys


So task-clock regardless of its ':u' or ':k' suffix shows approximately the wall
clock time.

But for my experiments on a shared system where kernel.perf_event_paranoid is
not set to -1 and I feel uncomfortable to test research prototypes as root I
would like a simple and fast way to tell where my program spends its time.

This can be achieved by using user_time and system_time.
Which are reported in the human readable output anyway.

Maybe I am doing something wrong here but as shown above task-clock does not
provide the information I am interested in.

Florian Fischer