Re: [PATCHv3 bpf-next 00/13] bpf: Add kprobe multi link

From: Jiri Olsa
Date: Sat Mar 19 2022 - 10:31:55 EST


On Sat, Mar 19, 2022 at 01:27:56PM +0100, Jiri Olsa wrote:
> On Fri, Mar 18, 2022 at 10:50:46PM -0700, Andrii Nakryiko wrote:
> > On Wed, Mar 16, 2022 at 5:24 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > >
> > > hi,
> > > this patchset adds new link type BPF_TRACE_KPROBE_MULTI that attaches
> > > kprobe program through fprobe API [1] instroduced by Masami.
> > >
> > > The fprobe API allows to attach probe on multiple functions at once very
> > > fast, because it works on top of ftrace. On the other hand this limits
> > > the probe point to the function entry or return.
> > >
> > >
> > > With bpftrace support I see following attach speed:
> > >
> > > # perf stat --null -r 5 ./src/bpftrace -e 'kprobe:x* { } i:ms:1 { exit(); } '
> > > Attaching 2 probes...
> > > Attaching 3342 functions
> > > ...
> > >
> > > 1.4960 +- 0.0285 seconds time elapsed ( +- 1.91% )
> > >
> > > v3 changes:
> > > - based on latest fprobe post from Masami [2]
> > > - add acks
> > > - add extra comment to kprobe_multi_link_handler wrt entry ip setup [Masami]
> > > - keep swap_words_64 static and swap values directly in
> > > bpf_kprobe_multi_cookie_swap [Andrii]
> > > - rearrange locking/migrate setup in kprobe_multi_link_prog_run [Andrii]
> > > - move uapi fields [Andrii]
> > > - add bpf_program__attach_kprobe_multi_opts function [Andrii]
> > > - many small test changes [Andrii]
> > > - added tests for bpf_program__attach_kprobe_multi_opts
> > > - make kallsyms_lookup_name check for empty string [Andrii]
> > >
> > > v2 changes:
> > > - based on latest fprobe changes [1]
> > > - renaming the uapi interface to kprobe multi
> > > - adding support for sort_r to pass user pointer for swap functions
> > > and using that in cookie support to keep just single functions array
> > > - moving new link to kernel/trace/bpf_trace.c file
> > > - using single fprobe callback function for entry and exit
> > > - using kvzalloc, libbpf_ensure_mem functions
> > > - adding new k[ret]probe.multi sections instead of using current kprobe
> > > - used glob_match from test_progs.c, added '?' matching
> > > - move bpf_get_func_ip verifier inline change to seprate change
> > > - couple of other minor fixes
> > >
> > >
> > > Also available at:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > bpf/kprobe_multi
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > [1] https://lore.kernel.org/bpf/164458044634.586276.3261555265565111183.stgit@devnote2/
> > > [2] https://lore.kernel.org/bpf/164735281449.1084943.12438881786173547153.stgit@devnote2/
> > > ---
> > > Jiri Olsa (13):
> > > lib/sort: Add priv pointer to swap function
> > > kallsyms: Skip the name search for empty string
> > > bpf: Add multi kprobe link
> > > bpf: Add bpf_get_func_ip kprobe helper for multi kprobe link
> > > bpf: Add support to inline bpf_get_func_ip helper on x86
> > > bpf: Add cookie support to programs attached with kprobe multi link
> > > libbpf: Add libbpf_kallsyms_parse function
> > > libbpf: Add bpf_link_create support for multi kprobes
> > > libbpf: Add bpf_program__attach_kprobe_multi_opts function
> > > selftests/bpf: Add kprobe_multi attach test
> > > selftests/bpf: Add kprobe_multi bpf_cookie test
> > > selftests/bpf: Add attach test for bpf_program__attach_kprobe_multi_opts
> > > selftests/bpf: Add cookie test for bpf_program__attach_kprobe_multi_opts
> > >
> >
> > Ok, so I've integrated multi-attach kprobes into retsnoop. It was
> > pretty straightforward. Here are some numbers for the speed of
> > attaching and, even more importantly, detaching for a set of almost
> > 400 functions. It's a bit less functions for fentry-based mode due to
> > more limited BTF information for static functions. Note that retsnoop
> > attaches two BPF programs for each kernel function, so it's actually
> > two multi-attach kprobes, each attaching to 397 functions. For
> > single-attach kprobe, we perform 397 * 2 = 794 attachments.
> >
> > I've been invoking retsnoop with the following specified set of
> > functions: -e '*sys_bpf*' -a ':kernel/bpf/syscall.c' -a
> > ':kernel/bpf/verifier.c' -a ':kernel/bpf/btf.c' -a
> > ':kernel/bpf/core.c'. So basically bpf syscall entry functions and all
> > the discoverable functions from syscall.c, verifier.c, core.c and
> > btf.c from kernel/bpf subdirectory.
> >
> > Results:
> >
> > fentry attach/detach (263 kfuncs): 2133 ms / 31671 ms (33 seconds)
> > kprobe attach/detach (397 kfuncs): 3121 ms / 13195 ms (16 seconds)
> > multi-kprobe attach/detach (397 kfuncs): 9 ms / 248 ms (0.25 seconds)
> >
> > So as you can see, the speed up is tremendous! API is also very
> > convenient, I didn't have to modify retsnoop internals much to
> > accommodate multi-attach API. Great job!
>
> nice! thanks for doing that so quickly
>
> >
> > Now for the bad news. :(
> >
> > Stack traces from multi-attach kretprobe are broken, which makes all
> > this way less useful.
> >
> > Below, see stack traces captured with multi- and single- kretprobes
> > for two different use cases. Single kprobe stack traces make much more
> > sense. Ignore that last function doesn't have actual address
> > associated with it (i.e. for __sys_bpf and bpf_tracing_prog_attach,
> > respectively). That's just how retsnoop is doing things, I think. We
> > actually were capturing stack traces from inside __sys_bpf (first
> > case) and bpf_tracing_prog_attach (second case).
> >
> > MULTI KPROBE:
> > ffffffff81185a80 __sys_bpf+0x0
> > (kernel/bpf/syscall.c:4622:1)
> >
> > SINGLE KPROBE:
> > ffffffff81e0007c entry_SYSCALL_64_after_hwframe+0x44
> > (arch/x86/entry/entry_64.S:113:0)
> > ffffffff81cd2b15 do_syscall_64+0x35
> > (arch/x86/entry/common.c:80:7)
> > . do_syscall_x64
> > (arch/x86/entry/common.c:50:12)
> > ffffffff811881aa __x64_sys_bpf+0x1a
> > (kernel/bpf/syscall.c:4765:1)
> > __sys_bpf
> >
> >
> > MULTI KPROBE:
> > ffffffff811851b0 bpf_tracing_prog_attach+0x0
> > (kernel/bpf/syscall.c:2708:1)
> >
> > SINGLE KPROBE:
> > ffffffff81e0007c entry_SYSCALL_64_after_hwframe+0x44
> > (arch/x86/entry/entry_64.S:113:0)
> > ffffffff81cd2b15 do_syscall_64+0x35
> > (arch/x86/entry/common.c:80:7)
> > . do_syscall_x64
> > (arch/x86/entry/common.c:50:12)
> > ffffffff811881aa __x64_sys_bpf+0x1a
> > (kernel/bpf/syscall.c:4765:1)
> > ffffffff81185e79 __sys_bpf+0x3f9
> > (kernel/bpf/syscall.c:4705:9)
> > ffffffff8118583a bpf_raw_tracepoint_open+0x19a
> > (kernel/bpf/syscall.c:3069:6)
> > bpf_tracing_prog_attach
> >
> > You can see that in multi-attach kprobe we only get one entry, which
> > is the very last function in the stack trace. We have no parent
> > function captured whatsoever. Jiri, Masami, any ideas what's wrong and
> > how to fix this? Let's try to figure this out and fix it before the
> > feature makes it into the kernel release. Thanks in advance!
>
> oops, I should have tried kstack with the bpftrace's kretprobe, I see the same:
>
> # ./src/bpftrace -e 'kretprobe:x* { @[kstack] = count(); }'
> Attaching 1 probe...
> Attaching 3340probes
> ^C
>
> @[
> xfs_trans_apply_dquot_deltas+0
> ]: 22
> @[
> xlog_cil_commit+0
> ]: 22
> @[
> xlog_grant_push_threshold+0
> ]: 22
> @[
> xfs_trans_add_item+0
> ]: 22
> @[
> xfs_log_reserve+0
> ]: 22
> @[
> xlog_space_left+0
> ]: 22
> @[
> xfs_buf_offset+0
> ]: 22
> @[
> xfs_trans_free_dqinfo+0
> ]: 22
> @[
> xlog_ticket_alloc+0
> xfs_log_reserve+5
> ]: 22
> @[
> xfs_cil_prepare_item+0
>
>
> I think it's because we set original ip for return probe to have
> bpf_get_func_ip working properly, but it breaks backtrace of course
>
> I'm not sure we could bring along the original regs for return probe,
> but I have an idea how to workaround the bpf_get_func_ip issue and
> keep the registers intact for other helpers

change below is using bpf_run_ctx to store link and entry ip on stack,
where helpers can find it.. it fixed the retprobe backtrace for me

I had to revert the get_func_ip inline.. it's squashed in the change
below for quick testing.. I'll send revert in separate patch with the
formal change

could you please test?

thanks,
jirka


---
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0287176bfe9a..cf92f9c01556 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13678,7 +13678,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
continue;
}

- /* Implement tracing bpf_get_func_ip inline. */
+ /* Implement bpf_get_func_ip inline. */
if (prog_type == BPF_PROG_TYPE_TRACING &&
insn->imm == BPF_FUNC_get_func_ip) {
/* Load IP address from ctx - 16 */
@@ -13693,25 +13693,6 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
continue;
}

-#ifdef CONFIG_X86
- /* Implement kprobe_multi bpf_get_func_ip inline. */
- if (prog_type == BPF_PROG_TYPE_KPROBE &&
- eatype == BPF_TRACE_KPROBE_MULTI &&
- insn->imm == BPF_FUNC_get_func_ip) {
- /* Load IP address from ctx (struct pt_regs) ip */
- insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
- offsetof(struct pt_regs, ip));
-
- new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
- if (!new_prog)
- return -ENOMEM;
-
- env->prog = prog = new_prog;
- insn = new_prog->insnsi + i + delta;
- continue;
- }
-#endif
-
patch_call_imm:
fn = env->ops->get_func_proto(insn->imm, env->prog);
/* all functions that have prototype and verifier allowed
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9a7b6be655e4..aefe33c08296 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -80,7 +80,8 @@ u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
static int bpf_btf_printf_prepare(struct btf_ptr *ptr, u32 btf_ptr_size,
u64 flags, const struct btf **btf,
s32 *btf_id);
-static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip);
+static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx);
+static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx);

/**
* trace_call_bpf - invoke BPF program
@@ -1042,8 +1043,7 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {

BPF_CALL_1(bpf_get_func_ip_kprobe_multi, struct pt_regs *, regs)
{
- /* This helper call is inlined by verifier on x86. */
- return instruction_pointer(regs);
+ return bpf_kprobe_multi_entry_ip(current->bpf_ctx);
}

static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe_multi = {
@@ -1055,7 +1055,7 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe_multi = {

BPF_CALL_1(bpf_get_attach_cookie_kprobe_multi, struct pt_regs *, regs)
{
- return bpf_kprobe_multi_cookie(current->bpf_ctx, instruction_pointer(regs));
+ return bpf_kprobe_multi_cookie(current->bpf_ctx);
}

static const struct bpf_func_proto bpf_get_attach_cookie_proto_kmulti = {
@@ -2220,15 +2220,16 @@ struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
unsigned long *addrs;
- /*
- * The run_ctx here is used to get struct bpf_kprobe_multi_link in
- * get_attach_cookie helper, so it can't be used to store data.
- */
- struct bpf_run_ctx run_ctx;
u64 *cookies;
u32 cnt;
};

+struct bpf_kprobe_multi_run_ctx {
+ struct bpf_run_ctx run_ctx;
+ struct bpf_kprobe_multi_link *link;
+ unsigned long entry_ip;
+};
+
static void bpf_kprobe_multi_link_release(struct bpf_link *link)
{
struct bpf_kprobe_multi_link *kmulti_link;
@@ -2282,18 +2283,21 @@ static int bpf_kprobe_multi_cookie_cmp(const void *a, const void *b, const void
return __bpf_kprobe_multi_cookie_cmp(a, b);
}

-static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip)
+static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx)
{
+ struct bpf_kprobe_multi_run_ctx *run_ctx;
struct bpf_kprobe_multi_link *link;
+ u64 *cookie, entry_ip;
unsigned long *addr;
- u64 *cookie;

if (WARN_ON_ONCE(!ctx))
return 0;
- link = container_of(ctx, struct bpf_kprobe_multi_link, run_ctx);
+ run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx);
+ link = run_ctx->link;
+ entry_ip = run_ctx->entry_ip;
if (!link->cookies)
return 0;
- addr = bsearch(&ip, link->addrs, link->cnt, sizeof(ip),
+ addr = bsearch(&entry_ip, link->addrs, link->cnt, sizeof(entry_ip),
__bpf_kprobe_multi_cookie_cmp);
if (!addr)
return 0;
@@ -2301,10 +2305,22 @@ static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip)
return *cookie;
}

+static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
+{
+ struct bpf_kprobe_multi_run_ctx *run_ctx;
+
+ run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx);
+ return run_ctx->entry_ip;
+}
+
static int
kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
- struct pt_regs *regs)
+ unsigned long entry_ip, struct pt_regs *regs)
{
+ struct bpf_kprobe_multi_run_ctx run_ctx = {
+ .link = link,
+ .entry_ip = entry_ip,
+ };
struct bpf_run_ctx *old_run_ctx;
int err;

@@ -2315,7 +2331,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,

migrate_disable();
rcu_read_lock();
- old_run_ctx = bpf_set_run_ctx(&link->run_ctx);
+ old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
rcu_read_unlock();
@@ -2330,24 +2346,10 @@ static void
kprobe_multi_link_handler(struct fprobe *fp, unsigned long entry_ip,
struct pt_regs *regs)
{
- unsigned long saved_ip = instruction_pointer(regs);
struct bpf_kprobe_multi_link *link;

- /*
- * Because fprobe's regs->ip is set to the next instruction of
- * dynamic-ftrace instruction, correct entry ip must be set, so
- * that the bpf program can access entry address via regs as same
- * as kprobes.
- *
- * Both kprobe and kretprobe see the entry ip of traced function
- * as instruction pointer.
- */
- instruction_pointer_set(regs, entry_ip);
-
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
- kprobe_multi_link_prog_run(link, regs);
-
- instruction_pointer_set(regs, saved_ip);
+ kprobe_multi_link_prog_run(link, entry_ip, regs);
}

static int
@@ -2514,7 +2516,11 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
{
return -EOPNOTSUPP;
}
-static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip)
+static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx)
+{
+ return 0;
+}
+static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
{
return 0;
}