Re: [PATCH 02/16] x86/dumpstack: Add get_stack_info() support for the SYSENTER stack

From: Josh Poimboeuf
Date: Mon Nov 20 2017 - 21:29:45 EST


On Mon, Nov 20, 2017 at 05:39:34PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 20, 2017 at 1:55 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > On Mon, Nov 20, 2017 at 01:30:12PM -0800, Andy Lutomirski wrote:
> >> On Mon, Nov 20, 2017 at 1:27 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >> > On Mon, Nov 20, 2017 at 01:07:16PM -0800, Andy Lutomirski wrote:
> >> >> >> but, more importantly, the OOPS unwinder will just bail without this
> >> >> >> patch. With the patch, we get a valid unwind, except that everything
> >> >> >> has a ? in front.
> >> >> >
> >> >> > Hm. I can't even fathom how that's possible. Are you talking about the
> >> >> > "unwind from NMI to SYSENTER stack" path? Or any unwind to a syscall?
> >> >> > Either way I'm baffled... If the unwinder only encounters the SYSENTER
> >> >> > stack at the end, how could that cause everything beforehand to have a
> >> >> > question mark?
> >> >>
> >> >> I mean that, if I put a ud2 or other bug in the code that runs on the
> >> >> SYSENTER stack, without this patch, I get a totally blank call trace.
> >> >
> >> > I would expect a blank call trace either way...
> >>
> >> Try making sync_regs use a few kB of stack space or, better yet, call
> >> a non-inlined function that uses too much stack.
> >
> > You mean overflow the exception stack? I still don't see how that would
> > do it.
> >
> > If you could show a specific example, with splats from before/after,
> > that would be helpful. Because I still have no idea how this patch
> > could possibly help.
>
> I added BUG() to sync_regs(). With the patch, I get:
>
> [ 4.211553] PANIC: double fault, error_code: 0x0
> [ 4.212113] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #920
> [ 4.212741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1.fc26 04/01/2014
> [ 4.213536] task: ffff88001aa18000 task.stack: ffff88001aa20000
> [ 4.214059] RIP: 0010:do_error_trap+0x33/0x1c0
> [ 4.214449] RSP: 0000:ffffffffff1b8f78 EFLAGS: 00010096
> [ 4.214934] RAX: dffffc0000000000 RBX: ffffffffff1b8f90 RCX: 0000000000000006
> [ 4.215554] RDX: ffffffff82048b20 RSI: 0000000000000000 RDI: ffffffffff1b9110
> [ 4.216176] RBP: ffffffffff1b9088 R08: 0000000000000004 R09: 0000000000000000
> [ 4.216793] R10: 0000000000000000 R11: fffffbffffe3723f R12: 0000000000000006
> [ 4.217419] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
> [ 4.218046] FS: 0000000000000000(0000) GS:ffff88001ae00000(0000)
> knlGS:0000000000000000
> [ 4.218775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4.219280] CR2: ffffffffff1b8f68 CR3: 00000000193da002 CR4: 00000000003606f0
> [ 4.219931] Call Trace:
> [ 4.220156] <SYSENTER>
> [ 4.220383] ? async_page_fault+0x36/0x60
> [ 4.220768] ? invalid_op+0x22/0x40
> [ 4.221087] ? async_page_fault+0x36/0x60
> [ 4.221442] ? sync_regs+0x3c/0x40
> [ 4.221745] ? sync_regs+0x2e/0x40
> [ 4.222051] ? error_entry+0x6c/0xd0
> [ 4.222395] ? async_page_fault+0x36/0x60
> [ 4.222748] </SYSENTER>

Ah, page fault. I thought you were talking about an NMI. I get it now.

Did it overflow the stack? I think that would explain the question
marks.

--
Josh