Re: [PATCH] perf scripts python: Add a script to run instances of perf script in parallel

From: Adrian Hunter
Date: Mon Mar 11 2024 - 13:52:19 EST


On 11/03/24 18:13, Andi Kleen wrote:
> On Sun, Mar 10, 2024 at 09:35:02PM +0200, Adrian Hunter wrote:
>> Add a Python script to run a perf script command multiple times in
>> parallel, using perf script options --cpu and --time so that each job
>> processes a different chunk of the data.
>>
>> Refer to the script's own help text at the end of the patch for more
>> details.
>>
>> The script is useful for Intel PT traces, that can be efficiently
>> decoded by perf script when split by CPU and/or time ranges. Running
>> jobs in parallel can decrease the overall decoding time.
>
> This only optimizes for the run time of the decoder. Often when you do
> analysis you have a non trivial part of it in some analysis script too,
> but you currently have no directi / easy way to paralelize that. It would
> be better to support parallel pipelines.

It will parallelize any scripts and / or dlfilters that perf script
itself executes.

>
> TBH I'm not sure the script is worth it. If you need to do parallel
> pipelines (which imho is the common case) it's probably better to just
> write a custom shell script, which is not that difficult.

It can be a pain to figure out how best to split the data if it is not
evenly distributed.

The script also has value as a reference or starting point for
users.

> It might be
> better to have a helper that makes writing such scripts easier,
> e.g. figuring out reasonable options for manual parallelization
> based on the input file. I think parts of your script do that, maybe
> it is usable for that.

The --dry-run option shows the perf script commands, but an option
to pipe through another command could be added.

>
> Also as a default output it would be better to just merge the
> original output in order and output it on stdout.

That assumes that the output comes from perf script printf
output and not a perf script _script_.

If the data is split by CPU, it will not be in time order
if it is simply concatenated back together.

>
> You should probably limit the number of jobs to some minimum
> length, otherwise on systems with many CPUs there might be
> inefficiently short jobs.

That happens for Intel PT (64 PSB minimum), but could be added
for the normal case also.