Re: 8b275b3754 ("x86/irq/64: Remap the IRQ stack with guard pages"): BUG: unable to handle kernel paging request at ffffb659000a1000

From: Andy Lutomirski
Date: Sat Apr 06 2019 - 10:01:52 EST


On Sat, Apr 6, 2019 at 6:54 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On Fri, Apr 5, 2019 at 11:38 PM kernel test robot <lkp@xxxxxxxxx> wrote:
> >
> > Greetings,
> >
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/stackguards
> >
> > commit 8b275b3754465d502d393f8ae8dd355b7067e73f
> > Author: Andy Lutomirski <luto@xxxxxxxxxx>
> > AuthorDate: Fri Jul 13 19:01:23 2018 -0700
> > Commit: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > CommitDate: Fri Apr 5 17:04:10 2019 +0200
> >
> > x86/irq/64: Remap the IRQ stack with guard pages
> >
> > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > will overwrite other data structures.
> >
> > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > immediately on an IRQ stack overflow.
> >
> > [ tglx: Move the map code to a proper place and invoke it only when a CPU
> > is about to be brought online. No point in installing the map at
> > early boot for all possible CPUs. Fail the CPU bringup if the vmap
> > fails as done for all other preparatory stages in cpu hotplug. ]
> >
> > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> I haven't spotted the actual bug yet, but the faulting instruction is:
>
> 2a: 65 8b 35 09 ca 75 63 mov %gs:*0x6375ca09(%rip),%esi
> # 0x6375ca3a <-- trapping instruction
>

Gah, -ETOOLITTLESLEEP. This is a bit strange:

e: 4c 8d 74 24 08 lea 0x8(%rsp),%r14
...
26: 49 83 c6 08 add $0x8,%r14
2a:* 4d 8b 7e f8 mov -0x8(%r14),%r15 <--
trapping instruction

Which is an access to the stack above RSP by a few bytes. But that
can't be an overflow, since it's *above* RSP. Is something possibly
screwy with the mapping?

I might have a chance to debug this for real this evening. Right now
I'm about to try to wrangle a sick kid through an airport.