Re: [PATCH] tracing: Add generic test_recursion_try_acquire()

From: Yafang Shao
Date: Sat Apr 15 2023 - 10:34:02 EST


On Sat, Apr 15, 2023 at 9:33 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Sat, Apr 15, 2023 at 10:17 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
> >
> > The ftrace_test_recursion_trylock() also disables preemption. This is not
> > required, but was a clean up as every place that called it also disabled
> > preemption, and making the two tightly coupled appeared to make the code
> > simpler.
> >
> > But the recursion protection can be used for other purposes that do not
> > require disabling preemption. As the recursion bits are attached to the
> > task_struct, it follows the task, so there's no need for preemption being
> > disabled.
> >
> > Add test_recursion_try_acquire/release() functions to be used generically,
> > and separate it from being associated with ftrace. It also removes the
> > "lock" name, as there is no lock happening. Keeping the "lock" for the
> > ftrace version is better, as it at least differentiates that preemption is
> > being disabled (hence, "locking the CPU").
> >
> > Link: https://lore.kernel.org/linux-trace-kernel/20230321020103.13494-1-laoar.shao@xxxxxxxxx/
> >
> > Cc: Yafang Shao <laoar.shao@xxxxxxxxx>
> > Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
>
> Acked-by: Yafang Shao <laoar.shao@xxxxxxxxx>
>
> +Alexei, bpf
>
> Thanks Steven.
> I almost finished replacing prog->active with
> test_recursion_try_{acquire,release}[1], and yet I need to do more
> tests.
>
> In my verification, I find that something abnormal happens. When I ran
> bpf self tests after the replacement, I found the fentry recursion
> test failed[2]. The test result as follows,
>
> main:PASS:skel_open_and_load 0 nsec
> main:PASS:skel_attach 0 nsec
> main:PASS:pass1 == 0 0 nsec
> main:PASS:pass1 == 1 0 nsec
> main:PASS:pass1 == 2 0 nsec
> main:PASS:pass2 == 0 0 nsec
> main:FAIL:pass2 == 1 unexpected pass2 == 1: actual 2 != expected 1 [0]
> main:FAIL:pass2 == 2 unexpected pass2 == 2: actual 4 != expected 2
> main:PASS:get_prog_info 0 nsec
> main:PASS:recursion_misses 0 nsec
>

I don't have a clear idea why TRACE_CTX_TRANSITION must be needed, but
it seems we have to do below code for the fentry test case,

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index d48cd92d2364..f6b4601dd604 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -163,21 +163,8 @@ static __always_inline int
trace_test_and_set_recursion(unsigned long ip, unsign
return -1;

bit = trace_get_context_bit() + start;
- if (unlikely(val & (1 << bit))) {
- /*
- * If an interrupt occurs during a trace, and another trace
- * happens in that interrupt but before the preempt_count is
- * updated to reflect the new interrupt context, then this
- * will think a recursion occurred, and the event will be dropped.
- * Let a single instance happen via the TRANSITION_BIT to
- * not drop those events.
- */
- bit = TRACE_CTX_TRANSITION + start;
- if (val & (1 << bit)) {
- do_ftrace_record_recursion(ip, pip);
- return -1;
- }
- }
+ if (unlikely(val & (1 << bit)))
+ return -1;

val |= 1 << bit;
current->trace_recursion = val;


> The reason is that the bpf_map_delete_elem() in this fentry SEC()[2]
> will hit the first if-condition[3] but fail to hit the second
> if-condition[4], so it won't be considered as recursion at the first
> time. So the value 'pass2' will be 2[0]. Illustrated as follows,
>
> SEC("fentry/htab_map_delete_elem")
> pass2++; <<<< Turn out to be 1 after this operation.
> bpf_map_delete_elem(&hash2, &key);
> SEC("fentry/htab_map_delete_elem") <<<< not recursion
> pass2++; <<<< Turn out to be 2 after this operation.
> bpf_map_delete_elem(&hash2, &key);
> SEC("fentry/htab_map_delete_elem") <<<< RECURSION!!
>
> That said, if a function can call itself, the first call won't be
> considered as recursion. Is that expected ?
>
> One thing I can be sure of is that prog->active can't handle the
> interrupted case[5]. If the interrupt happens, it will be considered
> as a recursion.
> But it seems that bpf_map_delete_elem() in this fentry SEC() is still
> in the process context.
>
> [1]. https://lore.kernel.org/bpf/CALOAHbACzCwu-VeMczEJw8A4WFgkL-uQDS1NkcVR2pqEMZyAQQ@xxxxxxxxxxxxxx/
> [2]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38
> [3]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/trace_recursion.h#n166
> [4]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/trace_recursion.h#n176
> [5]. https://lore.kernel.org/bpf/20230409220239.0fcf6738@xxxxxxxxxxxxxxxxxxxx/
>
> > ---
> > include/linux/trace_recursion.h | 47 ++++++++++++++++++++++++---------
> > kernel/trace/ftrace.c | 2 ++
> > 2 files changed, 37 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> > index d48cd92d2364..80de2ee7b4c3 100644
> > --- a/include/linux/trace_recursion.h
> > +++ b/include/linux/trace_recursion.h
> > @@ -150,9 +150,6 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
> > # define trace_warn_on_no_rcu(ip) false
> > #endif
> >
> > -/*
> > - * Preemption is promised to be disabled when return bit >= 0.
> > - */
> > static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
> > int start)
> > {
> > @@ -182,18 +179,11 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
> > val |= 1 << bit;
> > current->trace_recursion = val;
> > barrier();
> > -
> > - preempt_disable_notrace();
> > -
> > return bit;
> > }
> >
> > -/*
> > - * Preemption will be enabled (if it was previously enabled).
> > - */
> > static __always_inline void trace_clear_recursion(int bit)
> > {
> > - preempt_enable_notrace();
> > barrier();
> > trace_recursion_clear(bit);
> > }
> > @@ -205,12 +195,18 @@ static __always_inline void trace_clear_recursion(int bit)
> > * tracing recursed in the same context (normal vs interrupt),
> > *
> > * Returns: -1 if a recursion happened.
> > - * >= 0 if no recursion.
> > + * >= 0 if no recursion and preemption will be disabled.
> > */
> > static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> > unsigned long parent_ip)
> > {
> > - return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> > + int bit;
> > +
> > + bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> > + if (unlikely(bit < 0))
> > + return bit;
> > + preempt_disable_notrace();
> > + return bit;
> > }
> >
> > /**
> > @@ -220,6 +216,33 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> > * This is used at the end of a ftrace callback.
> > */
> > static __always_inline void ftrace_test_recursion_unlock(int bit)
> > +{
> > + preempt_enable_notrace();
> > + trace_clear_recursion(bit);
> > +}
> > +
> > +/**
> > + * test_recursion_try_acquire - tests for recursion in same context
> > + *
> > + * This will detect recursion of a function.
> > + *
> > + * Returns: -1 if a recursion happened.
> > + * >= 0 if no recursion
> > + */
> > +static __always_inline int test_recursion_try_acquire(unsigned long ip,
> > + unsigned long parent_ip)
> > +{
> > + return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> > +}
> > +
> > +/**
> > + * test_recursion_release - called after a success of test_recursion_try_acquire()
> > + * @bit: The return of a successful test_recursion_try_acquire()
> > + *
> > + * This releases the recursion lock taken by a non-negative return call
> > + * by test_recursion_try_acquire().
> > + */
> > +static __always_inline void test_recursion_release(int bit)
> > {
> > trace_clear_recursion(bit);
> > }
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index db8532a4d5c8..1b117ab057ed 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -7299,6 +7299,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> > if (bit < 0)
> > return;
> >
> > + preempt_disable();
> > do_for_each_ftrace_op(op, ftrace_ops_list) {
> > /* Stub functions don't need to be called nor tested */
> > if (op->flags & FTRACE_OPS_FL_STUB)
> > @@ -7320,6 +7321,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> > }
> > } while_for_each_ftrace_op(op);
> > out:
> > + preempt_enable();
> > trace_clear_recursion(bit);
> > }
> >
> > --
> > 2.39.2
> >
>
>
> --
> Regards
> Yafang



--
Regards
Yafang