Re: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

From: Ben Gainey
Date: Mon Mar 25 2024 - 10:19:21 EST


On Fri, 2024-03-22 at 18:22 -0700, Namhyung Kim wrote:
> On Fri, Mar 22, 2024 at 9:42 AM Ben Gainey <ben.gainey@xxxxxxx>
> wrote:
> >
> > This change allows events to use PERF_SAMPLE READ with inherit so
> > long
> > as both inherit_stat and PERF_SAMPLE_TID are set.
> >
> > Currently it is not possible to use PERF_SAMPLE_READ with inherit.
> > This
> > restriction assumes the user is interested in collecting aggregate
> > statistics as per `perf stat`. It prevents a user from collecting
> > per-thread samples using counter groups from a multi-threaded or
> > multi-process application, as with `perf record -e '{....}:S'`.
> > Instead
> > users must use system-wide mode, or forgo the ability to sample
> > counter
> > groups. System-wide mode is often problematic as it requires
> > specific
> > permissions (no CAP_PERFMON / root access), or may lead to capture
> > of
> > significant amounts of extra data from other processes running on
> > the
> > system.
> >
> > Perf already supports the ability to collect per-thread counts with
> > `inherit` via the `inherit_stat` flag. This patch changes
>
> I'm not sure about this part. IIUC inherit and inherit_stat is not
> for
> per-thread counts, it only supports per-process (including children)
> events.

Hi Namhyung

Thanks for the comments...

I don't think this is correct, if you compare the behaviour of

perf record --no-inherit ... <some-forking-processes>
perf script -F pid,tid | sort -u
and
perf record --no-inherit ... <some-multithreaded-processes>
perf script -F pid,tid | sort -u

vs

perf record ... <some-forking-processes>
perf script -F pid,tid | sort -u
and
perf record .. <some-multithreaded-processes>
perf script -F pid,tid | sort -u

The behaviour is consistent with the fact that no-inherit only records
the primary thread of the primary process, whereas in the inherit case
any child tasks (either threads or forked processes) is recorded.


>
>
> > `perf_event_alloc` relaxing the restriction to combine `inherit`
> > with
> > `PERF_SAMPLE_READ` so that the combination will be allowed so long
> > as
> > `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
>
> Anyway, does it really need 'inherit_stat'? I think it's only for
> counting use cases (e.g. 'perf stat') not for sampling.


I would be very happy to remove the inherit_stat requirement. When I
first came to this it seemed like the logic was all there in
inherit_stat already, but now that I have to take a different path in
`perf_event_context_sched_out` I suspect it should be trivial to remove
the inherit_stat requirement.


>
> Also technically, it can have PERF_SAMPLE_STREAM_ID instead
> of PERF_SAMPLE_TID to distinguish the counter values.


It looks like you are correct, but the ID given in the read_format part
of PERF_SAMPLE_RECORD is the ID rather than STREAM_ID. (I had
incorrectly thought/stated it was the latter). Hence when processing
the read_format values in the sample record, we either need to use the
TID to uniquely identify the source, or we would need to modify the
read_format to (additionally) include the STREAM_ID.

* The current approach in tools uses the ID+TID, which puts more
complexity in the tools but means there isn't an extra field in the
read_format data (for each value).
* Alternatively I could introduce a PERF_FORMAT_STREAM_ID; I would
expect that the user/tool would need to specify
PERF_FORMAT_ID|PERF_FORMAT_STREAM_ID as they would need to use the ID
to lookup the correct perf_event_attr, but could use the STREAM_ID to
uniquely identify the child event. This approach would add an extra u64
per value in the read_format data but is possibly simpler/safer for
tools?

Any preferences?


>
> >
> > In this configuration stream ids (such as may appear in the
> > read_format
> > field of a PERF_RECORD_SAMPLE) are no longer globally unique,
> > rather
> > the pair of (stream id, tid) uniquely identify each event. Tools
> > that
> > rely on this, for example to calculate a delta between samples,
> > would
> > need updating to take this into account. Previously valid event
> > configurations (system-wide, no-inherit and so on) where each
> > stream id
> > is the identifier are unaffected.
>
> I think you meant PERF_SAMPLE_ID not PERF_SAMPLE_STREAM_ID.
> IIUC the stream id is already unique.
>

Yes my mistake.


> Thanks,
> Namhyung
>
> >
> >
> > Changes since v3:
> > - Cleaned up perf test data changes incorrectly included into this
> > series from elsewhere.
> >
> > Changes since v2:
> > - Rebase on v6.8
> > - Respond to James Clarke's feedback; fixup some typos and move
> > some
> > repeated checks into a helper macro.
> > - Cleaned up checkpatch lints.
> > - Updated perf test; fixed evsel handling so that existing tests
> > pass
> > and added new tests to cover the new behaviour.
> >
> > Changes since v1:
> > - Rebase on v6.8-rc1
> > - Fixed value written into sample after child exists.
> > - Modified handling of switch-out so that context with these
> > events
> > take the slow path, so that the per-event/per-thread PMU state
> > is
> > correctly switched.
> > - Modified perf tools to support this mode of operation.
> >
> > Ben Gainey (4):
> > perf: Support PERF_SAMPLE_READ with inherit_stat
> > tools/perf: Track where perf_sample_ids need per-thread periods
> > tools/perf: Correctly calculate sample period for inherited
> > SAMPLE_READ values
> > tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when
> > opening events
> >
> > include/linux/perf_event.h | 1 +
> > kernel/events/core.c | 62 ++++++++++----
> > tools/lib/perf/evlist.c | 1 +
> > tools/lib/perf/evsel.c | 48 +++++++++++
> > tools/lib/perf/include/internal/evsel.h | 55 ++++++++++++-
> > .../test-record-group-sampling-inherit-stat | 62 ++++++++++++++
> > tools/perf/util/evsel.c | 82
> > ++++++++++++++++++-
> > tools/perf/util/evsel.h | 1 +
> > tools/perf/util/session.c | 11 ++-
> > 9 files changed, 301 insertions(+), 22 deletions(-)
> > create mode 100644 tools/perf/tests/attr/test-record-group-
> > sampling-inherit-stat
> >
> > --
> > 2.44.0
> >

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.