Re: [crash] PANIC: double fault, error_code: 0x0

From: Ingo Molnar
Date: Sat Nov 25 2017 - 04:21:51 EST



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

>
> * Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> > > Note that if *any* of those 4 padding sequences is removed, the kernel starts
> > > crashing again. Also note that the exact size of the padding appears to be not
> > > material - it could be larger as well, i.e. it's not an alignment bug I think.
> > >
> > > In any case it's not a problem in the actual assembly code paths itself it
> > > appears.
> > >
> > > One guess would be tha it's some sort of sizing bug: maybe the padding forces a
> > > key piece of data or code on another page - but I'm too tired to root cause it
> > > right now.
> > >
> > > Any ideas?
> >
> > This smells like a pagerable setup bug. Either the pagetables are a bit broken or they're totally busted and the passing gets something in a more TLB-friendly place.
>
> Also note that the delta patch below also keeps it working, i.e. doubling the
> first padding and eliminating the second padding.
>
> I.e. it's the total per IRQ entry padding that matters, not the exact placement of
> the padding.
>
> I.e. some sort of sizing bug - IDT and/or the pagetables.
>
> (Also note that in my config NR_CPUS is at 128 - defconfigs are 64.)

The simplest padding I found is the one below - this indicates some sort of
section sizing or page table setup bug (or page alignment bug) and makes races and
other bugs less likely.

Thanks,

Ingo

=================>
arch/x86/entry/entry_64.S | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4ac952080869..ea992ca4e74f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -547,6 +547,8 @@ END(irq_entries_start)
ud2
.Lokay_\@:
addq $8, %rsp
+#else
+ .rep 64; nop; .endr
#endif
.endm