Re: [crash] PANIC: double fault, error_code: 0x0

From: Ingo Molnar
Date: Fri Nov 24 2017 - 17:53:27 EST



* Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:

> > Note that if *any* of those 4 padding sequences is removed, the kernel starts
> > crashing again. Also note that the exact size of the padding appears to be not
> > material - it could be larger as well, i.e. it's not an alignment bug I think.
> >
> > In any case it's not a problem in the actual assembly code paths itself it
> > appears.
> >
> > One guess would be tha it's some sort of sizing bug: maybe the padding forces a
> > key piece of data or code on another page - but I'm too tired to root cause it
> > right now.
> >
> > Any ideas?
>
> This smells like a pagerable setup bug. Either the pagetables are a bit broken or they're totally busted and the passing gets something in a more TLB-friendly place.

Also note that the delta patch below also keeps it working, i.e. doubling the
first padding and eliminating the second padding.

I.e. it's the total per IRQ entry padding that matters, not the exact placement of
the padding.

I.e. some sort of sizing bug - IDT and/or the pagetables.

(Also note that in my config NR_CPUS is at 128 - defconfigs are 64.)

Thanks,

Ingo

---
arch/x86/entry/entry_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/x86/entry/entry_64.S
===================================================================
--- linux.orig/arch/x86/entry/entry_64.S
+++ linux/arch/x86/entry/entry_64.S
@@ -548,7 +548,7 @@ END(irq_entries_start)
.Lokay_\@:
addq $8, %rsp
#else
- .rep 16; nop; .endr
+ .rep 32; nop; .endr
#endif
.endm

@@ -600,7 +600,7 @@ END(irq_entries_start)
ud2
.Lirq_stack_okay\@:
#else
- .rep 16; nop; .endr
+// .rep 16; nop; .endr
#endif

.Lirq_stack_push_old_rsp_\@: