Re: BUG: unable to handle kernel paging request in __switch_to

From: Linus Torvalds
Date: Thu Dec 14 2017 - 14:28:23 EST


On Thu, Dec 14, 2017 at 10:54 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> 2. It actually tries to handle the breakpoint. A breakpoint is a
> benign exception, so any exception encountered while delivering it
> would result in serial delivery.

I don't think that's the case. "int3" is entirely synchronous, and
doesn't have the same odd issues as a breakpoint trap (which honors RF
etc). It's literally just a one-byte shorthand for "int $3".

There should be no serial delivery, although obviously if it's a trap
gate (as opposed to an interrupt gate), you can get a normal external
interrupt on the first instruction of the exception handler.

But that's not what the oops says: it says it happens on the "int3" instruction.

Now, it is possible that the "int3" was written _after_ the CPU took a
real page fault on the original instruction, and that the original
instruction actually caused a perfectly normal page fault, and then we
just report the "int3" because another CPU overwrote the instruction
after the original instruction had already trapped.

But that makes very little sense either. I really do think it's the
"int3" itself that causes the page fault due to some IDT/GDT change.
Because that would actually make sense considering what has changed in
the tree that Thomas is running.

Plus I think the instruction that gets overwritten is just a 5-byte
nop isn't it? So it really shouldn't take a fault without the "int3"
overwriting.

[ Goes back to the original report ]

Yeah, so looking back at the "Code:" line, the faulting instruction
looked like this:

<cc> 1f 44 00 00

and a P6_NOP5 is

#define P6_NOP5 0x0f,0x1f,0x44,0x00,0

so it's definitely "first byte of a 5-byte nop has been overwritten
with a 'int3' instruction". The nop does not fault on its own.

Linus