Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

From: Tetsuo Handa
Date: Sat Jun 26 2021 - 21:12:09 EST


On 2021/06/27 3:22, Steven Rostedt wrote:
>> If BPF is expected to register the same tracepoint with the same
>> callback and data more than once, then let's add a call to do that
>> without warning. Like I said, other callers expect the call to succeed
>> unless it's out of memory, which tends to cause other problems.
>
> If BPF is OK with registering the same probe more than once if user
> space expects it, we can add this patch, which allows the caller (in
> this case BPF) to not warn if the probe being registered is already
> registered, and keeps the idea that a probe registered twice is a bug
> for all other use cases.

I think BPF will not register the same tracepoint with the same callback and
data more than once, for bpf(BPF_RAW_TRACEPOINT_OPEN) cleans the request up
by calling bpf_link_cleanup() and returns -EEXIST. But I think BPF relies on
tracepoint_add_func() returning -EEXIST without crashing the kernel.

CPU: 0 PID: 16193 Comm: syz-executor.5 Not tainted 5.13.0-rc7-syzkaller #0
RIP: 0010:tracepoint_add_func+0x1fb/0xa90 kernel/tracepoint.c:291
Call Trace:
tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
__bpf_probe_register kernel/trace/bpf_trace.c:1843 [inline]
bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:1848
bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2895
__do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4453
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) {
switch (cmd) {
case BPF_RAW_TRACEPOINT_OPEN:
err = bpf_raw_tracepoint_open(&attr) {
err = bpf_link_prime(&link->link, &link_primer);
if (err) {
kfree(link);
goto out_put_btp;
}
err = bpf_probe_register(link->btp, prog) {
return __bpf_probe_register(btp, prog) {
return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog) {
return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO) {
mutex_lock(&tracepoints_mutex); // Serialization start.
ret = tracepoint_add_func(tp, &tp_func, prio) {
old = func_add(&tp_funcs, func, prio); // Returns -EEXIST.
if (IS_ERR(old)) {
WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM); // Crashes due to warn_on_paic==1.
return PTR_ERR(old); // Returns -EEXIST.
}
}
mutex_unlock(&tracepoints_mutex); // Serialization end.
return ret; // Returns -EEXIST.
}
}
}
}
if (err) {
bpf_link_cleanup(&link_primer); // Reject if func_add() returned -EEXIST.
goto out_put_btp;
}
return bpf_link_settle(&link_primer);
}
break;
}
return ret; // Returns -EEXIST to the userspace.
}

On 2021/06/27 0:41, Steven Rostedt wrote:
>> (1) func_add() can reject an attempt to add same tracepoint multiple times
>> by returning -EEXIST to the caller.
>> (2) But tracepoint_add_func() (the caller of func_add()) is calling WARN_ON_ONCE()
>> if func_add() returned -EEXIST.
>
> That's because (before BPF) there's no place in the kernel that tries
> to register the same tracepoint multiple times, and was considered a
> bug if it happened, because there's no ref counters to deal with adding
> them multiple times.

I see. But does that make sense? Since func_add() can fail with -ENOMEM,
all places (even before BPF) needs to be prepared for failures.

>
> If the tracepoint is already registered (with the given function and
> data), then something likely went wrong.

That can be prepared on the caller side of tracepoint_add_func() rather than
tracepoint_add_func() side.

>
>> (3) And tracepoint_add_func() is triggerable via request from userspace.
>
> Only via BPF correct?
>
> I'm not sure how it works, but can't BPF catch that it is registering
> the same tracepoint again?

There is no chance to check whether some tracepoint is already registered, for
tracepoints_mutex is the only lock which gives us a chance to check whether
some tracepoint is already registered.

Should bpf() syscall hold a global lock (like tracepoints_mutex) which will serialize
the entire code in order to check whether some tracepoint is already registered?
That might severely damage concurrency.