Re: [PATCH 4/4] tracing: Remove RCU work arounds from stack tracer

From: Steven Rostedt
Date: Fri Sep 22 2017 - 21:27:48 EST


On Fri, 22 Sep 2017 15:54:55 -0700
"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:

> On Fri, Sep 22, 2017 at 06:15:47PM -0400, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx>
> >
> > While debugging some RCU issues with the stack tracer, it was discovered
> > that the problem was much more than with the stack tracer itself, but with
> > the saving of the stack trace, which could happen from any WARN() as well.
> > The problem was fixed within kernel_text_address().
> >
> > One of the bugs that was discovered was that the stack tracer called
> > rcu_enter_irq() unconditionally. Paul McKenney said that could cause issues
> > as well. Instead of adding logic to only call rcu_enter_irq() if RCU is not
> > watching from within the stack tracer, since the core issue has been fixed
> > (within save_stack_trace()), we can simply remove all the logic in the stack
> > tracer that deals with RCU work arounds.
>
> I must confess that I am having some difficulty parsing this paragraph,
> especially the last sentence...
>
> Does this capture it?
>
> One problem is that the stack tracer called rcu_irq_enter()
> unconditionally, which is problematic if RCU's last
> not-watching-to-watching transition was carried out by
> rcu_nmi_enter. In that case, rcu_irq_enter() actually switches

I thought the rcu_irq_enter() after rcu_nmi_enter() was a separate bug.
Your original complaint was that I called rcu_irq_enter()
unconditionally, and wanted me to only call it if RCU wasn't watching.

But, the new code could possibly have this get called after
rcu_nmi_enter() because we are calling it without in_nmi() being set.


> RCU back to the not-watching state for this CPU, which results
> in lockdep splats complaining about rcu_read_lock() being
> used on an idle (not-watched) CPU. The first patch of this
> series addressed this problem by having rcu_irq_enter() and
> rcu_irq_exit() refrain from doing anything when rcu_nmi_enter()
> caused RCU to start watching this CPU. The third patch in this
> series caused save_stack_trace() to invoke rcu_nmi_enter() and
> rcu_nmi_exit() as needed, so this fourth patch now removes the
> rcu_irq_enter() and rcu_irq_exit() from within the stack tracer.
>
> One further question... Can I now remove the rcu_irq_enter_disabled()
> logic?

After this goes in. Yes. But that doesn't need to be a stable change.

>
> > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
> > Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
> > Suggested-by: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
>
> With the hard-to-parse paragraph fixed:
>
> Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

Thanks, but we may need to go back and forth a bit to get the change
log correct.

Remember the first bug. The one that was fixed by changing
rcu_irq_enter() to rcu_nmi_enter()?

=============================
WARNING: suspicious RCU usage
4.13.0-rc7-test+ #117 Tainted: G W
-----------------------------
/work/git/linux-trace.git/arch/x86/kernel/traps.c:305 entry code didn't wake RCU!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/1/0.

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.13.0-rc7-test+ #117
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x86/0xcf
lockdep_rcu_suspicious+0xc5/0x100
do_error_trap+0x125/0x130
? do_error_trap+0x5/0x130
? trace_hardirqs_off_thunk+0x1a/0x1c
? do_invalid_op+0x5/0x30
do_invalid_op+0x20/0x30
invalid_op+0x1e/0x30
RIP: 0010:module_assert_mutex_or_preempt+0x34/0x40
RSP: 0018:ffffc900006abc58 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffffa000a077 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000046
RBP: ffffc900006abc58 R08: ffffc900006abf40 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8801188d8040 R15: ffffffff81ed5720
? 0xffffffffa000a077
? module_assert_mutex_or_preempt+0x30/0x40
__module_address+0x2c/0xf0
? 0xffffffffa000a077
__module_text_address+0x12/0x60
? 0xffffffffa000a077
is_module_text_address+0x1f/0x50
? 0xffffffffa000a077
__kernel_text_address+0x30/0x90
unwind_get_return_address+0x1f/0x30
__save_stack_trace+0x83/0xd0
? 0xffffffffa000a077
? rcu_dynticks_eqs_exit+0x5/0x40
save_stack_trace+0x1b/0x20
check_stack+0xf8/0x2f0
? rcu_dynticks_eqs_enter+0x30/0x30
stack_trace_call+0x6e/0x80
0xffffffffa000a077
? ftrace_graph_caller+0x78/0xa8
? rcu_dynticks_eqs_exit+0x5/0x40
rcu_dynticks_eqs_exit+0x5/0x40
rcu_idle_exit+0xdf/0xf0
? rcu_dynticks_eqs_exit+0x5/0x40
? rcu_idle_exit+0xdf/0xf0
do_idle+0x128/0x200
cpu_startup_entry+0x1d/0x20
start_secondary+0x108/0x130
secondary_startup_64+0x9f/0x9f

This was caused by just using rcu_irq_enter(). Not sure if this will
still be an issue or not. But because we now add an rcu_nmi_enter()
without being in_nmi(), we probably should do this. This code doesn't
run if in_nmi() is true, but it could run from the stack trace dump
itself, and that now calls rcu_nmi_enter().

Actually, thinking about this more, this doesn't need to go in stable.
As recursive rcu_irq_enter() calls should not hurt, and you now allow
rcu_irq_enter() to be called even after a rcu_nmi_enter() right?

-- Steve