Re: [PATCH RFC 0/5] perf: Add ioctl to emit sideband events

From: Ian Rogers
Date: Mon Apr 17 2023 - 12:38:01 EST


On Mon, Apr 17, 2023 at 4:02 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 14, 2023 at 11:22:55AM +0300, Adrian Hunter wrote:
> > Hi
> >
> > Here is a stab at adding an ioctl for sideband events.
> >
> > This is to overcome races when reading the same information
> > from /proc.
>
> What races? Are you talking about reading old state in /proc the kernel
> delivering a sideband event for the new state, and then you writing the
> old state out?
>
> Surely that's something perf tool can fix without kernel changes?

So my reading is that during event synthesis there are races between
reading the different /proc files. There is still, I believe, a race
in with perf record/top with uid filtering which reminds me of this.
The uid filtering race is that we scan /proc to find processes (pids)
for a uid, we then synthesize the maps for each of these pids but if a
pid starts or exits we either error out or don't sample that pid. I
believe the error out behavior is easy to hit 100% of the time making
uid mode of limited use.

This may be for something other than synthesis, but for synthesis a
few points are:
- as servers get bigger and consequently more jobs get consolidated
on them, synthesis is slow (hence --num-thread-synthesize) and also
the events dominate the perf.data file - perhaps >90% of the file
size, and a lot of that will be for processes with no samples in them.
Another issue here is that all those file descriptors don't come for
free in the kernel.
- BPF has buildid+offset stack traces that remove the need for
synthesis by having more expensive stack generation. I believe this is
unpopular as adding this as a variant for every kind of event would be
hard, but perhaps we can do some low-hanging fruit like instructions
and cycles.
- I believe Jiri looked at doing synthesis with BPF. Perhaps we could
do something similar to the off-cpu and tail-synthesize, where more
things happen at the tail end of perf. Off-cpu records data in maps
that it then synthesizes into samples.

There is also a long standing issue around not sampling munmap (or
mremap) that causes plenty of issues. Perhaps if we had less mmap in
the perf.data file we could add these.

Thanks,
Ian