Re: [PATCH v1] perf stat: Combine the -A/--no-aggr and --no-merge options

From: Liang, Kan
Date: Thu Dec 14 2023 - 10:12:25 EST




On 2023-12-14 1:02 a.m., Ian Rogers wrote:
> The -A or --no-aggr option disables aggregation of core events:
> ```
> $ perf stat -A -e cycles,data_total -a true
>
> Performance counter stats for 'system wide':
>
> CPU0 1,287,665 cycles
> CPU1 1,831,681 cycles
> CPU2 27,345,998 cycles
> CPU3 1,964,799 cycles
> CPU4 236,174 cycles
> CPU5 3,302,825 cycles
> CPU6 9,201,446 cycles
> CPU7 1,403,043 cycles
> CPU0 110.90 MiB data_total
>
> 0.008961761 seconds time elapsed
> ```
>
> The --no-merge option disables the aggregation of uncore events:
> ```
> $ perf stat --no-merge -e cycles,data_total -a true
>
> Performance counter stats for 'system wide':
>
> 38,482,778 cycles
> 15.04 MiB data_total [uncore_imc_free_running_1]
> 15.00 MiB data_total [uncore_imc_free_running_0]
>
> 0.005915155 seconds time elapsed
> ```
>
> Having two options confuses users who generally don't appreciate the
> difference in PMUs. Keep all the options but make it so they all
> disable aggregation both of core and uncore events:
> ```
> $ perf stat -A -e cycles,data_total -a true
>
> Performance counter stats for 'system wide':
>
> CPU0 85,878 cycles
> CPU1 88,179 cycles
> CPU2 60,872 cycles
> CPU3 3,265,567 cycles
> CPU4 82,357 cycles
> CPU5 83,383 cycles
> CPU6 84,156 cycles
> CPU7 220,803 cycles
> CPU0 2.38 MiB data_total [uncore_imc_free_running_0]
> CPU0 2.38 MiB data_total [uncore_imc_free_running_1]
>
> 0.001397205 seconds time elapsed
> ```
>
> Update the relevant perf-stat man page information.
>
> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>

Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

Thanks,
Kan

> ---
> tools/perf/Documentation/perf-stat.txt | 52 ++++++++++++++------------
> tools/perf/builtin-stat.c | 5 ++-
> tools/perf/util/stat-display.c | 2 +-
> tools/perf/util/stat.c | 2 +-
> tools/perf/util/stat.h | 1 -
> 5 files changed, 33 insertions(+), 29 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index 8f789fa1242e..5af2e432b54f 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -422,7 +422,34 @@ See perf list output for the possible metrics and metricgroups.
>
> -A::
> --no-aggr::
> -Do not aggregate counts across all monitored CPUs.
> +--no-merge::
> +Do not aggregate/merge counts across monitored CPUs or PMUs.
> +
> +When multiple events are created from a single event specification,
> +stat will, by default, aggregate the event counts and show the result
> +in a single row. This option disables that behavior and shows the
> +individual events and counts.
> +
> +Multiple events are created from a single event specification when:
> +
> +1. PID monitoring isn't requested and the system has more than one
> + CPU. For example, a system with 8 SMT threads will have one event
> + opened on each thread and aggregation is performed across them.
> +
> +2. Prefix or glob wildcard matching is used for the PMU name. For
> + example, multiple memory controller PMUs may exist typically with a
> + suffix of _0, _1, etc. By default the event counts will all be
> + combined if the PMU is specified without the suffix such as
> + uncore_imc rather than uncore_imc_0.
> +
> +3. Aliases, which are listed immediately after the Kernel PMU events
> + by perf list, are used.
> +
> +--hybrid-merge::
> +Merge core event counts from all core PMUs. In hybrid or big.LITTLE
> +systems by default each core PMU will report its count
> +separately. This option forces core PMU counts to be combined to give
> +a behavior closer to having a single CPU type in the system.
>
> --topdown::
> Print top-down metrics supported by the CPU. This allows to determine
> @@ -475,29 +502,6 @@ highlight 'tma_frontend_bound'. This metric may be drilled into with
>
> Error out if the input is higher than the supported max level.
>
> ---no-merge::
> -Do not merge results from same PMUs.
> -
> -When multiple events are created from a single event specification,
> -stat will, by default, aggregate the event counts and show the result
> -in a single row. This option disables that behavior and shows
> -the individual events and counts.
> -
> -Multiple events are created from a single event specification when:
> -1. Prefix or glob matching is used for the PMU name.
> -2. Aliases, which are listed immediately after the Kernel PMU events
> - by perf list, are used.
> -
> ---hybrid-merge::
> -Merge the hybrid event counts from all PMUs.
> -
> -For hybrid events, by default, the stat aggregates and reports the event
> -counts per PMU. But sometimes, it's also useful to aggregate event counts
> -from all PMUs. This option enables that behavior and reports the counts
> -without PMUs.
> -
> -For non-hybrid events, it should be no effect.
> -
> --smi-cost::
> Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index bda020c0b9d5..5fe9abc6a524 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1204,8 +1204,9 @@ static struct option stat_options[] = {
> OPT_STRING('C', "cpu", &target.cpu_list, "cpu",
> "list of cpus to monitor in system-wide"),
> OPT_SET_UINT('A', "no-aggr", &stat_config.aggr_mode,
> - "disable CPU count aggregation", AGGR_NONE),
> - OPT_BOOLEAN(0, "no-merge", &stat_config.no_merge, "Do not merge identical named events"),
> + "disable aggregation across CPUs or PMUs", AGGR_NONE),
> + OPT_SET_UINT(0, "no-merge", &stat_config.aggr_mode,
> + "disable aggregation the same as -A or -no-aggr", AGGR_NONE),
> OPT_BOOLEAN(0, "hybrid-merge", &stat_config.hybrid_merge,
> "Merge identical named hybrid events"),
> OPT_STRING('x', "field-separator", &stat_config.csv_sep, "separator",
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index afe6db8e7bf4..8c61f8627ebc 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -898,7 +898,7 @@ static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config
>
> static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
> {
> - if (config->no_merge || hybrid_uniquify(counter, config))
> + if (config->aggr_mode == AGGR_NONE || hybrid_uniquify(counter, config))
> uniquify_event_name(counter);
> }
>
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 012c4946b9c4..b0bcf92f0f9c 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -592,7 +592,7 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
> {
> struct evsel *evsel;
>
> - if (config->no_merge)
> + if (config->aggr_mode == AGGR_NONE)
> return;
>
> evlist__for_each_entry(evlist, evsel)
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 325d0fad1842..4357ba114822 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -76,7 +76,6 @@ struct perf_stat_config {
> bool null_run;
> bool ru_display;
> bool big_num;
> - bool no_merge;
> bool hybrid_merge;
> bool walltime_run_table;
> bool all_kernel;