Re: [PATCHSET v4 0/4] perf stat: Enable BPF counters with --for-each-cgroup

From: Namhyung Kim
Date: Sun Jun 27 2021 - 11:30:14 EST


On Fri, Jun 25, 2021 at 12:18 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> Hello,
>
> This is to add BPF support for --for-each-cgroup to handle many cgroup
> events on big machines. You can use the --bpf-counters to enable the
> new behavior.
>
> * changes in v4
> - convert cgrp_readings to a per-cpu array map
> - remove now-unused cpu_idx map
> - move common functions to a header file
> - reuse bpftool bootstrap binary
> - fix build error in the cgroup code
>
> * changes in v3
> - support cgroup hierarchy with ancestor ids
> - add and trigger raw_tp BPF program
> - add a build rule for vmlinux.h
>
> * changes in v2
> - remove incorrect use of BPF_F_PRESERVE_ELEMS
> - add missing map elements after lookup
> - handle cgroup v1
>
> Basic idea is to use a single set of per-cpu events to count
> interested events and aggregate them to each cgroup. I used bperf
> mechanism to use a BPF program for cgroup-switches and save the
> results in a matching map element for given cgroups.
>
> Without this, we need to have separate events for cgroups, and it
> creates unnecessary multiplexing overhead (and PMU programming) when
> tasks in different cgroups are switched. I saw this makes a big
> difference on 256 cpu machines with hundreds of cgroups.
>
> Actually this is what I wanted to do it in the kernel [1], but we can
> do the job using BPF!

Ugh, I found the current kernel bpf verifier doesn't accept the
bpf_get_current_ancestor_cgroup_id() helper. Will send the fix
to BPF folks.

Thanks,
Namhyung