[PATCHSET v2 0/3] perf stat: Enable BPF counters with --for-each-cgroup

From: Namhyung Kim
Date: Mon Jun 14 2021 - 22:56:51 EST


Hello,

This is to add BPF support for --for-each-cgroup to handle many cgroup
events on big machines. You can use the --bpf-counters to enable the
new behavior.

* changes v2
- remove incorrect use of BPF_F_PRESERVE_ELEMS
- add missing map elements after lookup
- handle cgroup v1

Basic idea is to use a single set of per-cpu events to count
interested events and aggregate them to each cgroup. I used bperf
mechanism to use a BPF program for cgroup-switches and save the
results in a matching map element for given cgroups.

Without this, we need to have separate events for cgroups, and it
creates unnecessary multiplexing overhead (and PMU programming) when
tasks in different cgroups are switched. I saw this makes a big
difference on 256 cpu machines with hundreds of cgroups.

Actually this is what I wanted to do it in the kernel [1], but IIUC
with some limitations we can do the job using BPF.

Current limitations are:
* it doesn't support cgroup hierarchy
* there's no reliable way to trigger running the BPF program


Thanks,
Namhyung


[1] https://lore.kernel.org/lkml/20210413155337.644993-1-namhyung@xxxxxxxxxx/


Namhyung Kim (3):
perf tools: Add read_cgroup_id() function
perf tools: Add cgroup_is_v2() helper
perf stat: Enable BPF counter with --for-each-cgroup

tools/perf/Makefile.perf | 1 +
tools/perf/util/Build | 1 +
tools/perf/util/bpf_counter.c | 5 +
tools/perf/util/bpf_counter_cgroup.c | 319 ++++++++++++++++++++
tools/perf/util/bpf_skel/bperf_cgroup.bpf.c | 146 +++++++++
tools/perf/util/cgroup.c | 46 +++
tools/perf/util/cgroup.h | 12 +
7 files changed, 530 insertions(+)
create mode 100644 tools/perf/util/bpf_counter_cgroup.c
create mode 100644 tools/perf/util/bpf_skel/bperf_cgroup.bpf.c

--
2.32.0.272.g935e593368-goog