Re: [PATCH] perf scripts python: Add a script to run instances of perf script in parallel

From: Andi Kleen
Date: Mon Mar 11 2024 - 12:13:53 EST


On Sun, Mar 10, 2024 at 09:35:02PM +0200, Adrian Hunter wrote:
> Add a Python script to run a perf script command multiple times in
> parallel, using perf script options --cpu and --time so that each job
> processes a different chunk of the data.
>
> Refer to the script's own help text at the end of the patch for more
> details.
>
> The script is useful for Intel PT traces, that can be efficiently
> decoded by perf script when split by CPU and/or time ranges. Running
> jobs in parallel can decrease the overall decoding time.

This only optimizes for the run time of the decoder. Often when you do
analysis you have a non trivial part of it in some analysis script too,
but you currently have no directi / easy way to paralelize that. It would
be better to support parallel pipelines.

TBH I'm not sure the script is worth it. If you need to do parallel
pipelines (which imho is the common case) it's probably better to just
write a custom shell script, which is not that difficult. It might be
better to have a helper that makes writing such scripts easier,
e.g. figuring out reasonable options for manual parallelization
based on the input file. I think parts of your script do that, maybe
it is usable for that.

Also as a default output it would be better to just merge the
original output in order and output it on stdout.

You should probably limit the number of jobs to some minimum
length, otherwise on systems with many CPUs there might be
inefficiently short jobs.

-Andi