Re: [crash] PANIC: double fault, error_code: 0x0

From: Andy Lutomirski
Date: Fri Nov 24 2017 - 16:00:05 EST


On Fri, Nov 24, 2017 at 12:22 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>> This is a repost of the latest entry-stack plus Kaiser bits from Andy Lutomirski
>> (v3 series from today) and Dave Hansen (kaiser-414-tipwip-20171123 version),
>> on top of latest tip:x86/urgent (12a78d43de76).
>>
>> This version is pretty well tested, at least on the usual x86 tree test systems.
>> It has a couple of merge mistakes fixed, the biggest difference is in patch #22:
>>
>> x86/mm/kaiser: Prepare assembly for entry/exit CR3 switching
>>
>> The other patches are identical or very close to what I posted earlier today.
>
> Here's a new bug, on a testsystem I get the double fault boot crash attached
> below. The same bzImage crashes on other systems as well, so it's not CPU
> dependent.
>
> Via Kconfig-bisection I have narrowed it down to the following .config detail:
> it's triggered by _disabling_ CONFIG_DEBUG_ENTRY and enabling CONFIG_KAISER=y.
>
> I.e. one of the sanity checks of CONFIG_DEBUG_ENTRY has some positive side effect.

That's weird and definitely not intentional.

> I'll try to track down which one it is - any ideas meanwhile?
>
> Thanks,
>
> Ingo
>
> [ 8.797733] calling pt_dump_init+0x0/0x3b @ 1
> [ 8.803144] initcall pt_dump_init+0x0/0x3b returned 0 after 1 usecs
> [ 8.810589] calling aes_init+0x0/0x11 @ 1
> [ 8.815757] initcall aes_init+0x0/0x11 returned 0 after 141 usecs
> [ 8.823020] calling ghash_pclmulqdqni_mod_init+0x0/0x54 @ 1
> [ 8.831002] PANIC: double fault, error_code: 0x0

The double fault will be a stack overflow on the SYSENTER stack caused
by the page fault. You could try increasing the [64] to something
larger (PAGE_SIZE/8 perhaps) to see if the stack trace is better.

> [ 8.831002] CPU: 11 PID: 260 Comm: modprobe Not tainted 4.14.0-01419-g1b46550a680d-dirty #17
> [ 8.831002] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [ 8.831002] task: ffff880828ba8000 task.stack: ffffc90004444000
> [ 8.831002] RIP: 0010:page_fault+0x11/0x60
> [ 8.831002] RSP: 0000:ffffffffff0e7fc8 EFLAGS: 00010046
> [ 8.831002] RAX: 00000000819d4d77 RBX: 0000000000000001 RCX: ffffffff819d4d77
> [ 8.831002] RDX: 0000000000000003 RSI: 0000000000000010 RDI: ffffffffff0e8078
> [ 8.831002] RBP: 0000000000000000 R08: 00007ffd7f1aa530 R09: 00007f9407f98400
> [ 8.831002] R10: 0000000000000007 R11: 0000000000000000 R12: 00007ffd7f1aa680
> [ 8.831002] R13: 00007f9407f91f80 R14: 0000000000000007 R15: 0000000000000000
> [ 8.831002] FS: 00007f9407f8f700(0000) GS:ffff88082e640000(0000) knlGS:0000000000000000
> [ 8.831002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8.831002] CR2: ffffffffff0e7fb8 CR3: 0000000828bc4000 CR4: 00000000001406e0

Sadly CR2 is likely useless at this point because the double fault
will have clobbered it.

> [ 8.831002] Call Trace:
> [ 8.831002] <SYSENTER>
> [ 8.831002] ? __do_page_fault+0x4c0/0x4c0
> [ 8.831002] ? page_fault+0x2c/0x60
> [ 8.831002] ? native_iret+0x7/0x7

This is weird. native_iret+7 is IRETQ, but that should only appear at
the very top of the stack unless it's a nested entry. But a nested
IRETQ should never fail because it's a kernel context which is, by
construction, always valid.

> [ 8.831002] ? __do_page_fault+0x4c0/0x4c0
> [ 8.831002] ? page_fault+0x2c/0x60

It looks like the initial entry may have been a page fault from user mode.

> [ 8.831002] ? __entry_text_end+0x1/0x1

Um, what?

Josh, do you know why these stack traces are crappy? I think they
should unwind perfectly with ORC enabled. My guess is that the stack
access check is failing because RSP is out of bounds, but it shouldn't
need access to the out-of-bounds part to unwind back to where it's in
bounds.

Anyway, my best guess is that there's an error in the page tables
that's causing GDT access to fault. But even that's a rather weak
theory, since we shouldn't be executing page_fault on the SYSENTER
stack under any circumstances. Perhaps one of the CR3 switches is
page faulting or is selecting bogus page tables, causing an immediate
nested fault. On a nested fault, the stack switch won't activate
because regs point to kernel mode.

Maybe CONFIG_DEBUG_ENTRY should add some logic to the beginning of
page_fault to detect this condition and deliberately double-fault to
get a better trace. That would be a bit nontrivial, though.

> [ 8.831002] </SYSENTER>
> [ 8.831002] Code: ff e8 a4 75 6a ff e9 9f 02 00 00 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 83 c4 88 f6 84 24 88 00 00 00 03 75 20 <e8> 4a 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff
> [ 8.831002] Kernel panic - not syncing: Machine halted.
> [ 8.831002] CPU: 11 PID: 260 Comm: modprobe Not tainted 4.14.0-01419-g1b46550a680d-dirty #17
> [ 8.831002] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [ 8.831002] Call Trace:
> [ 8.831002] <#DF>
> [ 8.831002] dump_stack+0x46/0x62
> [ 8.831002] panic+0xde/0x221
> [ 8.831002] df_debug+0x29/0x30
> [ 8.831002] do_double_fault+0x8f/0x120
> [ 8.831002] double_fault+0x22/0x30
> [ 8.831002] RIP: 0010:page_fault+0x11/0x60
> [ 8.831002] RSP: 0000:ffffffffff0e7fc8 EFLAGS: 00010046
> [ 8.831002] RAX: 00000000819d4d77 RBX: 0000000000000001 RCX: ffffffff819d4d77
> [ 8.831002] RDX: 0000000000000003 RSI: 0000000000000010 RDI: ffffffffff0e8078
> [ 8.831002] RBP: 0000000000000000 R08: 00007ffd7f1aa530 R09: 00007f9407f98400
> [ 8.831002] R10: 0000000000000007 R11: 0000000000000000 R12: 00007ffd7f1aa680
> [ 8.831002] R13: 00007f9407f91f80 R14: 0000000000000007 R15: 0000000000000000
> [ 8.831002] ? native_iret+0x7/0x7
> [ 8.831002] WARNING: can't dereference iret registers at ffffffffff0e8048 for ip page_fault+0x11/0x60
> [ 8.831002] </#DF>
> [ 8.831002] <SYSENTER>
> [ 8.831002] ? __do_page_fault+0x4c0/0x4c0
> [ 8.831002] ? page_fault+0x2c/0x60
> [ 8.831002] ? native_iret+0x7/0x7
> [ 8.831002] ? __do_page_fault+0x4c0/0x4c0
> [ 8.831002] ? page_fault+0x2c/0x60
> [ 8.831002] ? __entry_text_end+0x1/0x1
> [ 8.831002] </SYSENTER>
> [ 8.831002] Kernel Offset: disabled
> [ 8.831002] Rebooting in 1 seconds..
> [ 8.831002] ACPI MEMORY or I/O RESET_REG.
>