RE: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.

From: Wang, Weilin
Date: Tue Mar 12 2024 - 20:26:49 EST




> -----Original Message-----
> From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Sent: Tuesday, March 12, 2024 5:03 PM
> To: Wang, Weilin <weilin.wang@xxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>; Ian Rogers
> <irogers@xxxxxxxxxx>; Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>; Peter
> Zijlstra <peterz@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
> Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>; Jiri Olsa
> <jolsa@xxxxxxxxxx>; Hunter, Adrian <adrian.hunter@xxxxxxxxx>; Kan Liang
> <kan.liang@xxxxxxxxxxxxxxx>; linux-perf-users@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Taylor, Perry <perry.taylor@xxxxxxxxx>; Alt, Samantha
> <samantha.alt@xxxxxxxxx>; Biggers, Caleb <caleb.biggers@xxxxxxxxx>
> Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when
> perf stat needs to get retire latency value for a metric.
>
> weilin.wang@xxxxxxxxx writes:
>
> > From: Weilin Wang <weilin.wang@xxxxxxxxx>
> >
> > When retire_latency value is used in a metric formula, perf stat would fork a
> > perf record process with "-e" and "-W" options. Perf record will collect
> > required retire_latency values in parallel while perf stat is collecting
> > counting values.
>
> How does that work when the workload is specified on the command line?
> The workload would run twice? That is very inefficient and may not
> work if it's a large workload.
>
> The perf tool infrastructure is imho not up to the task of such
> parallel collection.
>
> Also it won't work for very long collections because you will get a
> very large perf.data. Better to use a pipeline.
>
> I think it would be better if you made it a separate operation that can
> generate a file that is then consumed by perf stat. This is also more efficient
> because often the calibration is only needed once. And it's all under
> user control so no nasty surprises.
>

Workload runs only once with perf stat. Perf record is forked by perf stat and run
in parallel with perf stat. Perf stat will send perf record a signal to terminate after
perf stat stops collecting count value.

The implementation uses a PIPE to pass the sampled data from perf record instead
of writing the data into a file.

Thanks,
Weilin

> -Andi