Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start

From: Paul E. McKenney
Date: Tue Nov 29 2016 - 09:07:47 EST


On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote:
> > > We used to do that, but the resulting NMIs were problematic on some
> > > platforms. Perhaps things have gotten better?
> >
> > Did a little digging on git blame and found the following commit (which
> > seems to be the cause of the KASAN warning and missing stack dump):
> >
> > bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks")
> >
> > I presume this commit is still needed because of the NMI printk deadlock
> > issues which were discussed at Kernel Summit. I guess those issues need
> > to be sorted out before the above commit can be reverted.
>
> so printk should more or less work from NMI, esp. after:
>
> 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")

And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion
below. Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as
needing more work. Has that happened?

But I really like the fact that RCU CPU stall warnings dump only those
stacks that are likely to be involved, and the patch below goes back
to dumping everyone. Shouldn't be that hard to fix, though...

Thanx, Paul

------------------------------------------------------------------------

commit e7c9d76ed508fe978c6657e33f4de1b160ee4efe
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Tue Nov 29 05:49:06 2016 -0800

rcu: Once again use NMI-based stack traces in stall warnings

This commit is for all intents and purposes a revert of bc1dce514e9b
("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to
suppose that this can now safely be reverted is the presence of
42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"),
which is said to have made NMI-based stack dumps safe.

Not-yet-signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Petr Mladek <pmladek@xxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 91a68e4e6671..d73ccd4bed86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1396,7 +1396,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
}

/*
- * Dump stacks of all tasks running on stalled CPUs.
+ * Dump stacks of all tasks running on stalled CPUs. First try using
+ * NMIs, but fall back to manual remote stack tracing on architectures
+ * that don't support NMI-based stack dumps. The NMI-triggered stack
+ * traces are more accurate because they are printed by the target CPU.
*/
static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
{
@@ -1404,6 +1407,8 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
unsigned long flags;
struct rcu_node *rnp;

+ if (trigger_all_cpu_backtrace())
+ return;
rcu_for_each_leaf_node(rsp, rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->qsmask != 0) {