Re: [PATCH] x86/orc: Don't bail on stack overflow

From: Josh Poimboeuf
Date: Sat Nov 25 2017 - 23:49:14 EST


On Sat, Nov 25, 2017 at 10:41:15PM -0600, Josh Poimboeuf wrote:
> On Sat, Nov 25, 2017 at 08:25:12PM -0800, Andy Lutomirski wrote:
> > On Sat, Nov 25, 2017 at 6:40 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > > On Sat, Nov 25, 2017 at 04:16:23PM -0800, Andy Lutomirski wrote:
> > >> Can you send me whatever config and exact commit hash generated this?
> > >> I can try to figure out why it failed.
> > >
> > > Sorry, I've been traveling. I just got some time to take a look at
> > > this. I think there are at least two unwinder issues here:
> > >
> > > - It doesn't deal gracefully with the case where the stack overflows and
> > > the stack pointer itself isn't on a valid stack but the
> > > to-be-dereferenced data *is*.
> > >
> > > - The oops dump code doesn't know how to print partial pt_regs, for the
> > > case where if we get an interrupt/exception in *early* entry code
> > > before the full pt_regs have been saved.
> > >
> > > (Andy, I'm not quite sure about your patch, and whether it's still
> > > needed after these patches. I'll need to look at it later when I have
> > > more time.)
> >
> > I haven't tested yet, but I think my patch is probably still needed.
> > The issue I fixed is that unwind_start() would bail out early if sp
> > was below the stack. Also:
>
> Makes sense, maybe both are needed. Your patch deals with a bad SP at
> the beginning and mine deals with a bad SP in the middle.

I was able to recreate with the config Ingo posted earlier, along with
the following patch:

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e12168936d3f..693a20d309e3 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -500,7 +500,7 @@
VMLINUX_SYMBOL(__entry_text_end) = .;

#define IRQENTRY_TEXT \
- ALIGN_FUNCTION(); \
+ . = ALIGN(4096); \
VMLINUX_SYMBOL(__irqentry_text_start) = .; \
*(.irqentry.text) \
VMLINUX_SYMBOL(__irqentry_text_end) = .;


It looks a *lot* better with mine and your patches applied. It probably
would have helped Ingo and Thomas figure the problem out a lot sooner:

[ 1.159016] PANIC: double fault, error_code: 0x0
[ 1.159583] CPU: 1 PID: 68 Comm: modprobe Not tainted 4.14.0-01257-g761d390195b6-dirty #19
[ 1.159583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
[ 1.159583] task: ffff880136f6bb00 task.stack: ffffc90000984000
[ 1.159583] RIP: 0010:page_fault+0x11/0x60
[ 1.159583] RSP: 0000:ffffffffff083fc8 EFLAGS: 00010046
[ 1.159583] RAX: 00000000819d0a87 RBX: 0000000000000001 RCX: ffffffff819d0a87
[ 1.159583] RDX: 0000000000001000 RSI: 0000000000000010 RDI: ffffffffff084078
[ 1.159583] RBP: 0000000000000d68 R08: 00007f6d6bb24278 R09: 0000000000000023
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000000246 R12: 00007f6d6bb203c0
[ 1.159583] R13: 00007f6d6bb1f880 R14: 00007ffff793bebc R15: 0000000000000100
[ 1.159583] FS: 00007f6d6c39c5c0(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
[ 1.159583] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.159583] CR2: ffffffffff083fb8 CR3: 0000000136f78002 CR4: 00000000001606e0
[ 1.159583] Call Trace:
[ 1.159583] <SYSENTER>
[ 1.159583] __do_page_fault+0x4b0/0x4b0
[ 1.159583] page_fault+0x2c/0x60
[ 1.159583] RIP: 0010:do_page_fault+0x0/0x100
[ 1.159583] RSP: 0000:ffffffffff084120 EFLAGS: 00010012
[ 1.159583] RAX: 00000000819d0a87 RBX: 0000000000000001 RCX: ffffffff819d0a87
[ 1.159583] RDX: 0000000000001000 RSI: 0000000000000010 RDI: ffffffffff084128
[ 1.159583] RBP: 0000000000000d68 R08: 00007f6d6bb24278 R09: 0000000000000023
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000000246 R12: 00007f6d6bb203c0
[ 1.159583] R13: 00007f6d6bb1f880 R14: 00007ffff793bebc R15: 0000000000000100
[ 1.159583] ? native_iret+0x7/0x7
[ 1.159583] page_fault+0x2c/0x60
[ 1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0
[ 1.159583] RSP: 0000:ffffffffff0841d8 EFLAGS: 00010046
[ 1.159583] RAX: 0000000000000374 RBX: 0000558e0feca2c0 RCX: 00007f6d6b85aaf0
[ 1.159583] RDX: 0000000000001000 RSI: 0000558e0feca600 RDI: 0000000000000000
[ 1.159583] RBP: 0000000000000d68 R08: 00007f6d6bb24278 R09: 0000000000000023
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000000246 R12: 00007f6d6bb203c0
[ 1.159583] R13: 00007f6d6bb1f880 R14: 00007ffff793bebc R15: 0000000000000100
[ 1.159583] RIP: 0033:0x7f6d6b85aaf0
[ 1.159583] RSP: 002b:00007ffff793bd68 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 1.159583] RAX: ffffffff819d2000 RBX: 00007f6d6b85aaf0 RCX: 0000000000000010
[ 1.159583] RDX: 0000000000010046 RSI: ffffffffff0841d8 RDI: 0000000000000000
[ 1.159583] RBP: 0000000000000374 R08: ffffffffffffffff R09: 0000000000000000
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000001000 R12: 00007f6d6bb24278
[ 1.159583] R13: 0000000000000023 R14: 0000558e0feca600 R15: 0000000000000246
[ 1.159583] </SYSENTER>
[ 1.159583] Code: ff e8 94 b7 6a ff e9 9f 02 00 00 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20 <e8> 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff
[ 1.159583] Kernel panic - not syncing: Machine halted.
[ 1.159583] CPU: 1 PID: 68 Comm: modprobe Not tainted 4.14.0-01257-g761d390195b6-dirty #19
[ 1.159583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
[ 1.159583] Call Trace:
[ 1.159583] <#DF>
[ 1.159583] dump_stack+0x46/0x59
[ 1.159583] panic+0xde/0x223
[ 1.159583] df_debug+0x29/0x30
[ 1.159583] do_double_fault+0x9a/0x120
[ 1.159583] double_fault+0x22/0x30
[ 1.159583] RIP: 0010:page_fault+0x11/0x60
[ 1.159583] RSP: 0000:ffffffffff083fc8 EFLAGS: 00010046
[ 1.159583] RAX: 00000000819d0a87 RBX: 0000000000000001 RCX: ffffffff819d0a87
[ 1.159583] RDX: 0000000000001000 RSI: 0000000000000010 RDI: ffffffffff084078
[ 1.159583] RBP: 0000000000000d68 R08: 00007f6d6bb24278 R09: 0000000000000023
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000000246 R12: 00007f6d6bb203c0
[ 1.159583] R13: 00007f6d6bb1f880 R14: 00007ffff793bebc R15: 0000000000000100
[ 1.159583] ? native_iret+0x7/0x7
[ 1.159583] </#DF>
[ 1.159583] <SYSENTER>
[ 1.159583] RIP: 0010:do_page_fault+0x0/0x100
[ 1.159583] RSP: 0000:ffffffffff084070 EFLAGS: 00010097
[ 1.159583] page_fault+0x2c/0x60
[ 1.159583] RIP: 0010:do_page_fault+0x0/0x100
[ 1.159583] RSP: 0000:ffffffffff084120 EFLAGS: 00010012
[ 1.159583] RAX: 00000000819d0a87 RBX: 0000000000000001 RCX: ffffffff819d0a87
[ 1.159583] RDX: 0000000000001000 RSI: 0000000000000010 RDI: ffffffffff084128
[ 1.159583] RBP: 0000000000000d68 R08: 00007f6d6bb24278 R09: 0000000000000023
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000000246 R12: 00007f6d6bb203c0
[ 1.159583] R13: 00007f6d6bb1f880 R14: 00007ffff793bebc R15: 0000000000000100
[ 1.159583] ? native_iret+0x7/0x7
[ 1.159583] page_fault+0x2c/0x60
[ 1.159583] RIP: 0010:apic_timer_interrupt+0x0/0xb0
[ 1.159583] RSP: 0000:ffffffffff0841d8 EFLAGS: 00010046
[ 1.159583] RAX: 0000000000000374 RBX: 0000558e0feca2c0 RCX: 00007f6d6b85aaf0
[ 1.159583] RDX: 0000000000001000 RSI: 0000558e0feca600 RDI: 0000000000000000
[ 1.159583] RBP: 0000000000000d68 R08: 00007f6d6bb24278 R09: 0000000000000023
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000000246 R12: 00007f6d6bb203c0
[ 1.159583] R13: 00007f6d6bb1f880 R14: 00007ffff793bebc R15: 0000000000000100
[ 1.159583] RIP: 0033:0x7f6d6b85aaf0
[ 1.159583] RSP: 002b:00007ffff793bd68 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 1.159583] RAX: ffffffff819d2000 RBX: 00007f6d6b85aaf0 RCX: 0000000000000010
[ 1.159583] RDX: 0000000000010046 RSI: ffffffffff0841d8 RDI: 0000000000000000
[ 1.159583] RBP: 0000000000000374 R08: ffffffffffffffff R09: 0000000000000000
[ 1.159583] R10: 0000558e0feca600 R11: 0000000000001000 R12: 00007f6d6bb24278
[ 1.159583] R13: 0000000000000023 R14: 0000558e0feca600 R15: 0000000000000246
[ 1.159583] </SYSENTER>
[ 1.159583] Kernel Offset: disabled
[ 1.159583] ---[ end Kernel panic - not syncing: Machine halted.


--
Josh