Re: Oops with tip/x86/fpu

From: Oleg Nesterov
Date: Wed Mar 04 2015 - 14:09:59 EST


Thanks. I'll try to investigate tomorrow.

Well, the kernel crashes because xrstor_state() is buggy, Quentin already
has a fix.

But #GP should be explained...

On 03/04, Dave Hansen wrote:
>
> I'm running a commit from the tip/x86/fpu branch: ae486033b98. It's on
> a system which I normally boot with 'noxsaves'. When I boot without
> 'noxsaves' it is getting a GPF around the time that init is forked off.

And I assume that (before this commit) the kernel runs fine if you boot
without 'noxsaves'?

>
> The full oops is below, but addr2line points to the "alternative_input("
> line in xrstor_state().
>
> The one that oopses has this in bootup:
>
> xsave: enabled xstate_bv 0x1f, cntxt size 0x3c0 using compacted form
>
> The one that works says:
>
> xsave: enabled xstate_bv 0x1f, cntxt size 0x440 using standard form
>
> I bisected it down to:
>
> > commit 110d7f7513bbb916b8654da9e2973ac5bed929a9
> > Author: Oleg Nesterov <oleg@xxxxxxxxxx>
> > Date: Mon Jan 19 19:52:12 2015 +0100
> >
> > x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu()
> >
> > AFAICS, there is no reason why kernel threads should have FPU context
> > even if use_eager_fpu() == T. Now that interrupted_kernel_fpu_idle()
> > does not check __thread_has_fpu() in the use_eager_fpu() case, we
> > can remove the init_fpu() code from eager_fpu_init() and change
> > flush_thread() called by do_execve() to initialize FPU.
> >
> > Note: of course, the change in flush_thread() is horrible and must be
> > cleanuped. We need the new helper, and flush_thread() should return the
> > error if init_fpu() fails.
>
> It disassembles to:
>
> > All code
> > ========
> > 0: 00 00 add %al,(%rax)
> > 2: 48 c7 c7 58 a4 12 82 mov $0xffffffff8212a458,%rdi
> > 9: e8 03 13 14 00 callq 0x141311
> > e: db e2 fnclex
> > 10: 0f 77 emms
> > 12: db 83 3c 05 00 00 fildl 0x53c(%rbx)
> > 18: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > 1d: b8 ff ff ff ff mov $0xffffffff,%eax
> > 22: 48 8b bb 40 05 00 00 mov 0x540(%rbx),%rdi
> > 29: 89 c2 mov %eax,%edx
> > 2b:* 48 0f c7 1f xrstors64 (%rdi) <-- trapping instruction
> > 2f: 31 c0 xor %eax,%eax
> > 31: 45 31 e4 xor %r12d,%r12d
> > 34: 85 c0 test %eax,%eax
> > 36: 48 c7 c7 a8 a4 12 82 mov $0xffffffff8212a4a8,%rdi
> > 3d: 41 rex.B
> > 3e: 0f .byte 0xf
> > 3f: 95 xchg %eax,%ebp
>
> ...
> > [ 14.193801] Freeing unused kernel memory: 560K (ffff880001974000 - ffff880001a00000)
> > [ 14.203661] Freeing unused kernel memory: 1916K (ffff880001e21000 - ffff880002000000)
> > [ 14.213132] general protection fault: 0000 [#1] SMP
> > [ 14.218786] Modules linked in:
> > [ 14.222273] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.19.0-00430-gae48603-dirty #1428
> > [ 14.231375] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2P1.86C.X062.R00.1411270820 11/27/2014
> > [ 14.245698] task: ffff8801485a8000 ti: ffff880148620000 task.ti: ffff880148620000
> > [ 14.254189] RIP: 0010:[<ffffffff81004eda>] [<ffffffff81004eda>] math_state_restore+0x13a/0x380
> > [ 14.264076] RSP: 0000:ffff880148623b98 EFLAGS: 00010296
> > [ 14.270090] RAX: 00000000ffffffff RBX: ffff8801485a8000 RCX: 0000000000000000
> > [ 14.278186] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff88007f5f0000
> > [ 14.286277] RBP: ffff880148623bb8 R08: 0000000000000000 R09: ffff88007f5f0000
> > [ 14.294371] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801485a8000
> > [ 14.302468] R13: ffff88007f5e0000 R14: ffff8801485a8000 R15: ffffffff821ca800
> > [ 14.310574] FS: 0000000000000000(0000) GS:ffff88014e440000(0000) knlGS:0000000000000000
> > [ 14.319794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 14.326323] CR2: 0000000000000000 CR3: 000000007f820000 CR4: 00000000003407e0
> > [ 14.334420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 14.342516] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 14.350612] Stack:
> > [ 14.352896] ffff8801485a8000 0000000000000000 ffff8801485a8000 ffff88007f5e0000
> > [ 14.361366] ffff880148623be8 ffffffff8101210d 0000000000000000 ffff88007f590db0
> > [ 14.369810] ffff8801485a8000 ffff88007f5e0000 ffff880148623c58 ffffffff811f5074
> > [ 14.378267] Call Trace:
> > [ 14.381056] [<ffffffff8101210d>] flush_thread+0x1ad/0x270
> > [ 14.387281] [<ffffffff811f5074>] flush_old_exec+0x774/0xee0
> > [ 14.393702] [<ffffffff81256703>] load_elf_binary+0x353/0x1870
> > [ 14.400317] [<ffffffff811f3f47>] ? search_binary_handler+0x97/0x1f0
> > [ 14.407532] [<ffffffff810c491c>] ? do_raw_read_unlock+0x2c/0x50
> > [ 14.414361] [<ffffffff811f3f38>] search_binary_handler+0x88/0x1f0
> > [ 14.421374] [<ffffffff81255fc4>] load_script+0x274/0x2b0
> > [ 14.427503] [<ffffffff811f3ee8>] ? search_binary_handler+0x38/0x1f0
> > [ 14.434722] [<ffffffff810c491c>] ? do_raw_read_unlock+0x2c/0x50
> > [ 14.441563] [<ffffffff811f3f38>] search_binary_handler+0x88/0x1f0
> > [ 14.448577] [<ffffffff811f6436>] do_execveat_common.isra.32+0x746/0xa30
> > [ 14.456184] [<ffffffff811f6386>] ? do_execveat_common.isra.32+0x696/0xa30
> > [ 14.463988] [<ffffffff8194ad50>] ? rest_init+0x150/0x150
> > [ 14.470115] [<ffffffff811f674c>] do_execve+0x2c/0x30
> > [ 14.475848] [<ffffffff8100023b>] run_init_process+0x2b/0x30
> > [ 14.482264] [<ffffffff8194ad92>] kernel_init+0x42/0xf0
> > [ 14.488222] [<ffffffff8196b67c>] ret_from_fork+0x7c/0xb0
> > [ 14.494351] [<ffffffff8194ad50>] ? rest_init+0x150/0x150
> > [ 14.500481] Code: 00 00 48 c7 c7 58 a4 12 82 e8 03 13 14 00 db e2 0f 77 db 83 3c 05 00 00 0f 1f 44 00 00 b8 ff ff ff ff 48 8b bb 40 05 00 00 89 c2 <48> 0f c7 1f 31 c0 45 31 e4 85 c0 48 c7 c7 a8 a4 12 82 41 0f 95
> > [ 14.522792] RIP [<ffffffff81004eda>] math_state_restore+0x13a/0x380
> > [ 14.530031] RSP <ffff880148623b98>
> > [ 14.534061] ---[ end trace f99d58de7d83269b ]---
> > [ 14.539711] usb 1-5: New USB device found, idVendor=14dd, idProduct=1007
> > [ 14.549577] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=7
> > [ 14.560957] usb 1-5: Product: D2CIM-DVUSB
> > [ 14.567717] usb 1-5: Manufacturer: Raritan
> > [ 14.573636] usb 1-5: SerialNumber: HUX45017210000007
> > [ 14.579421] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [ 14.579421]
> > [ 14.580548] usb 1-5: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
> > [ 14.580595] usb 1-5: ep 0x82 - rounding interval to 64 microframes, ep desc says 80 microframes
> > [ 14.580634] usb 1-5: ep 0x83 - rounding interval to 64 microframes, ep desc says 80 microframes
> > [ 14.592305] input: Raritan D2CIM-DVUSB as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5:1.0/0003:14DD:1007.0001/input/input7
> > [ 14.632243] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> > [ 14.656356] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [ 14.656356]
> >
>
> Config is here:
>
> https://www.sr71.net/~dave/intel/config-20150303

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/