Re: [PATCH] bpf: reject blacklisted symbols in kprobe_multi to avoid recursive trap

From: Ze Gao
Date: Mon May 15 2023 - 02:07:06 EST


Dear all,

On Thu, May 11, 2023 at 9:06 AM Ze Gao <zegao2021@xxxxxxxxx> wrote:
>
> I'm afraid filtering in user space tools is not enough, cause it's a kernel BUG.
>
> it would 100% trigger a kernel crash if you run cmd like
>
> retsnoop -e 'pick_next_task_fair' -a ':kernel/sched/*.c' -vvv
>
> which is caused by that BPF_LINK_TYPE_KPROBE_MULTI accidentally
> attaches bpf progs
> to preempt_count_{add, sub}, which in turn triggers stackoverflow
> because the handler itself
> calls those functions.

I managed to see the big picture of this problem by digging into the code.

here is what it goes:

rethook_trampoline_handler{
preempt_disable() {
preempt_count_add() { <-- trace
fprobe_handler() {
ftrace_test_recursion_trylock
...
ftrace_test_recursion_unlock <- it fails to detect the
recursion caused by rethook (rethook_trampoline_handler precisely for
this case) routines.
}
...
rethook_trampoline_handler {
[ wash, rinse, repeat, CRASH!!! ]

There are some pitfalls here:
1. fprobe exit callback should be guarded by ftrace recursion check
as well since user might call any traceable functions
just like kprobe_multi_link_prog_run calls migrate_{disable, enable}.
In this case, detection in fprobe_handler only is not
enough.
2. rethook_trampoline_handler should use preempt_{disable,
enable}_notrace instead because it's beyond recursion-free
region guarded like 1
3. many rethook helper functions are also used outside the
recursion-free regions and therefore they should be marked
notrace

I've post a new series of patches to resolve cases mentioned above:
[Link]: https://lkml.org/lkml/2023/5/14/452

In theory, bpf_kprobe as the user of fprobe+ rethook, is spared from
suffering recursion again by applying these. And
[Link]: https://lore.kernel.org/all/20230513090548.376522-1-zegao@xxxxxxxxxxx/T/#u
becomes optional.

That leaves one final question, whether we need probe blacklist or
bpf_kprobe blacklist depends on how we deal with
user requests if one of its expected hook points fails because of
recursion detected. Do we need to reject them in the first place
by blacklist or let it fail to execute the callback silently, which
needs your gentle advice.

Regards,
Ze