Re: stack unwinder warning.

From: Josh Poimboeuf
Date: Thu Jan 05 2017 - 12:06:21 EST


On Thu, Jan 05, 2017 at 08:52:49AM -0600, Josh Poimboeuf wrote:
> On Tue, Dec 27, 2016 at 02:00:30PM -0500, Dave Jones wrote:
> > I'm not sure what to make of this. Josh ? (4.10-rc1)
> >
> > WARNING: kernel stack frame pointer at ffffc900003e7858 in trinity-c6:29122 has bad value ffffffff82103a80
> > unwind stack type:0 next_sp: (null) mask:2 graph_idx:0
> > ffffc900003e7808: ffffffff811a02e5 (ring_buffer_lock_reserve+0x1d5/0x580)
> > ffffc900003e7810: ffffffff8119adc3 (rb_commit+0x93/0x350)
> > ffffc900003e7818: ffffffff811b31d4 (function_trace_call+0x104/0x1f0)
> > ffffc900003e7820: ffff8804f10ec000 (0xffff8804f10ec000)
> > ffffc900003e7828: 0000000000000000 ...
> > ffffc900003e7830: ffffffff8119b3ae (ring_buffer_unlock_commit+0x8e/0x120)
> > ffffc900003e7838: 0000000000000001 (0x1)
> > ffffc900003e7840: ffffea0002854e00 (0xffffea0002854e00)
> > ffffc900003e7848: 000000000000000a (0xa)
> > ffffc900003e7850: ffffea0002854ec0 (0xffffea0002854ec0)
> > ffffc900003e7858: ffffea000287c480 (0xffffea000287c480)
>
> The value reported by the warning contradicts the value reported by the
> dump. So this seems to have been caused by dumping the stack of a task
> which is running on another CPU. There are still some places in the
> code where that's possible. So I'm going to need to remove these
> unwinder warnings for now.

I'll be submitting the following patch soon, which I think should
silence the warning. If the warning is recreatable, would you mind
testing it?


diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 4443e49..6fda186 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -207,6 +207,16 @@ bool unwind_next_frame(struct unwind_state *state)
return true;

bad_address:
+ /*
+ * When dumping a task other than current, the task might actually be
+ * running on another CPU, in which case it could be modifying its
+ * stack while we're reading it. This is generally not a problem and
+ * can be ignored as long as the caller understands that unwinding
+ * another task will not always succeed.
+ */
+ if (state->task != current)
+ goto the_end;
+
if (state->regs) {
printk_deferred_once(KERN_WARNING
"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
--
2.7.4