Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching

From: Paul E. McKenney
Date: Tue Oct 20 2015 - 16:25:46 EST


On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
>
> Paul,
>
> I've spent a couple of days debugging this, and finally found that my
> stack tracer was calling the stack trace code, which calls
> __module_address() which asserts the below.
>
> Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> everywhere (with interrupts always disabled)? This patch appears to fix
> the bug.

Yep! Just don't call it from an NMI handler. And don't call it with
interrupts enabled. The patch looks to have interrupts always disabled,
and the surrounding code doesn't look like NMI-safe code anyway, so
should be OK.

Thanx, Paul

> Peter,
>
> I'm going to be sending a second patch that converts that from a
> WARN_ON() to an open coded WARN_ON_ONCE(), because WARN_ON() also calls
> the stack trace code which calls __module_address() and we end up with
> an infinite warning about it. This prevented me from seeing where the
> bug actually was, and crashed the box.
>
> -- Steve
>
>
>
> From a2d7629048322ae62bff57f34f5f995e25ed234c Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Red Hat)" <rostedt@xxxxxxxxxxx>
> Date: Tue, 20 Oct 2015 11:38:08 -0400
> Subject: [PATCH] tracing: Have stack tracer force RCU to be watching
>
> The stack tracer was triggering the WARN_ON() in module.c:
>
> static void module_assert_mutex_or_preempt(void)
> {
> #ifdef CONFIG_LOCKDEP
> if (unlikely(!debug_locks))
> return;
>
> WARN_ON(!rcu_read_lock_sched_held() &&
> !lockdep_is_held(&module_mutex));
> #endif
> }
>
> The reason is that the stack tracer traces all function calls, and some of
> those calls happen while exiting or entering user space and idle. Some of
> these functions are called after RCU had already stopped watching, as RCU
> does not watch userspace or idle CPUs.
>
> If a max stack is hit, then the save_stack_trace() is called, which will
> check module addresses and call module_assert_mutex_or_preempt(), and then
> trigger the warning. Sad part is, the warning itself will also do a stack
> trace and tigger the same warning. That probably should be fixed.
>
> The warning was added by 0be964be0d45 "module: Sanitize RCU usage and
> locking" but this bug has probably been around longer. But it's unlikely to
> cause much harm, but the new warning causes the system to lock up.
>
> Cc: stable@xxxxxxxxxxxxxxx # 4.2+
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc:"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> ---
> kernel/trace/trace_stack.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index b746399ab59c..5f29402bff0f 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -88,6 +88,12 @@ check_stack(unsigned long ip, unsigned long *stack)
> local_irq_save(flags);
> arch_spin_lock(&max_stack_lock);
>
> + /*
> + * RCU may not be watching, make it see us.
> + * The stack trace code uses rcu_sched.
> + */
> + rcu_irq_enter();
> +
> /* In case another CPU set the tracer_frame on us */
> if (unlikely(!frame_size))
> this_size -= tracer_frame;
> @@ -169,6 +175,7 @@ check_stack(unsigned long ip, unsigned long *stack)
> }
>
> out:
> + rcu_irq_exit();
> arch_spin_unlock(&max_stack_lock);
> local_irq_restore(flags);
> }
> --
> 1.8.3.1
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/