Re: [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.

From: Namhyung Kim
Date: Fri Feb 23 2024 - 21:45:27 EST


On Thu, Feb 22, 2024 at 11:48 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Thu, Feb 22, 2024 at 11:03 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Wed, Feb 21, 2024 at 12:34 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > Weilin raised the TPEBS problem in the LPC 2023 talk, the issue being
> > > that sampling and counting don't really exist in the current perf tool
> > > code at the same time. BPF could be a workaround but permissions are
> > > an issue. Perhaps leader sampling but then what to do if two latencies
> > > are needed. Forking perf to do this is an expedient and ideally we'd
> > > not do it.
> >
> > Even with BPF, I think it needs two instances of an event - one for
> > counting and the other for sampling, right? I wonder if it can just
> > use a single event for sampling and show the sum of periods in
> > PERF_SAMPLE_READ.
> >
> > I'm not sure if an event group can have sampling and non-sampling
> > events at the same time. But it can be done without groups then.
> > Anyway what's the issue with two latencies?
>
> The latencies come from samples and with leader sampling only the
> leader gets sampled so we can't get two latencies. For 2 latencies
> we'd need 2 groups for 2 leaders or to modify leader sampling

Do those 2 latencies come from 2 events or a single event?

But I realized that PERF_SAMPLE_READ would return the period
only and I guess the latency is in PERF_SAMPLE_WEIGHT(_STRUCT), right?
Then it won't work with PERF_SAMPLE_READ unless we extend the
read format to include the weights.

> - if we
> could encode that we want to sample but don't need the sample in the
> mmap, just want the latency being available to be read, etc. This and
> BPF are both long-term viable solutions, but forking is the expedient
> solution to get something going - we'd likely want it as a fallback
> anyway.

Maybe we can add it to the read format, but I'm not sure how the
kernel maintains the value. PERF_SAMPLE_READ would be fine
to return the value in the sample. But it should support read(2) too.

Simply adding the values might not be what users want. Maybe
average latency/weight is meaningful but it could depend on
what the event measures..

Thanks,
Namhyung