Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core)

From: Andy Lutomirski
Date: Fri Jun 24 2016 - 16:53:32 EST


On Fri, Jun 24, 2016 at 1:51 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Fri, Jun 24, 2016 at 03:25:30PM -0500, Josh Poimboeuf wrote:
>> On Fri, Jun 24, 2016 at 11:11:47AM -0700, Linus Torvalds wrote:
>> > On Fri, Jun 24, 2016 at 10:51 AM, Linus Torvalds
>> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > >
>> > > And in particular, the init_task stack initialization initialized it
>> > > to the init_thread pointer. Which was definitely deadly.
>> > >
>> > > Let's see if that was it..
>> >
>> > No, it's still broken. But it's *less* broken, so here's a new version
>> > of the patch that at least gets some of the stack setup right, in my
>> > hope that somebody will bother to look at this, and being less broken
>> > might mean that somebody sees what else I missed..
>>
>> I found at least one bug. The changing of task->stack from a "void *" to an
>> "unsigned long *":
>>
>> > - void *stack;
>> > + unsigned long *stack;
>>
>> That subtly changes the pointer arithmetic in do_boot_cpu():
>>
>>
>> idle->thread.sp = (unsigned long) (((struct pt_regs *)
>> (THREAD_SIZE + task_stack_page(idle))) - 1);
>>
>>
>> That ends up adding 128k to the stack page bottom instead of 16k.
>>
>> But fixing that doesn't seem to fix this:
>>
>> [18446743832.576241] ------------[ cut here ]------------
>> [18446743832.576241] WARNING: CPU: 1 PID: 0 at /home/jpoimboe/git/linux/arch/x86/kernel/cpu/common.c:1434 cpu_init+0x34b/0x440
>> [18446743832.576241] Modules linked in:
>> [18446743832.576241] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc4+ #47
>> [18446743832.576241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
>> [18446743832.576241] 0000000000000086 574e5e6c6855ace9 ffff88007c553e88 ffffffff8143cb83
>> [18446743832.576241] 0000000000000000 0000000000000000 ffff88007c553ec8 ffffffff810b0e7b
>> [18446743832.576241] 0000059a00000000 0000000000000000 0000000000000000 0000000000000000
>> [18446743832.576241] Call Trace:
>> [18446743832.576241] [<ffffffff8143cb83>] dump_stack+0x85/0xc2
>> [18446743832.576241] [<ffffffff810b0e7b>] __warn+0xcb/0xf0
>> [18446743832.576241] [<ffffffff810b0fad>] warn_slowpath_null+0x1d/0x20
>> [18446743832.576241] [<ffffffff810491bb>] cpu_init+0x34b/0x440
>> [18446743832.576241] [<ffffffff8105ab7c>] start_secondary+0x1c/0x1a0
>> [18446743832.576241] ---[ end trace 924d57afbaca0720 ]---
>>
>> So there's at least another bug lurking..
>
> Found another bug:
>
> #define stack_smp_processor_id() \
> ({ \
> struct thread_info *ti; \
> __asm__("andq %%rsp,%0; ":"=r" (ti) : "0" (CURRENT_MASK)); \
> ti->cpu; \
> })
>
> That macro is obviously no longer valid.
>
> That seems to cause the above warning. When trying to boot CPU 1,
> cpu_init() calls the above macro which incorrectly returns 0.

Fixed in my queue by removing the function:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=01b1a4b6fd629820625b64ca6e17c987f2ee8c09



--
Andy Lutomirski
AMA Capital Management, LLC