Re: [Bug 199965] New: Memory management: BUG in kernel_restart

From: Greg Kroah-Hartman
Date: Sat Jun 09 2018 - 10:07:56 EST


On Fri, Jun 08, 2018 at 03:15:08PM -0700, Andrew Morton wrote:
>
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Thu, 07 Jun 2018 18:21:24 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=199965
> >
> > Bug ID: 199965
> > Summary: Memory management: BUG in kernel_restart
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 4.17.0
> > Hardware: All
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Other
> > Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
> > Reporter: mlen@xxxxxxx
> > Regression: No
> >
> > Reboot randomly fails on 4.17.0 due to memory management issues. Worked fine on
> > 4.16.13
>
> Oh gee, there isn't much to go on here. Unknown kobject on
> devices_kset() is in a crappy state during kernel restart. Greg, is
> there something we can do to make that kobject_get() warning more
> informative? Probably not.
>
>
> > <4>[21100.397182] ------------[ cut here ]------------
> > <4>[21100.397185] kobject: '(null)' (0000000047d32b91): is not initialized, yet
> > kobject_get() is being called.

I don't know how to get any more informative that this :)


> > <4>[21100.397209] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:593
> > kobject_get+0x21/0x32
> > <4>[21100.397211] Modules linked in:
> > <4>[21100.397215] CPU: 1 PID: 25848 Comm: reboot Not tainted 4.17.0-gentoo #2
> > <4>[21100.397217] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> > WS, BIOS 3407 03/10/2017
> > <4>[21100.397219] RIP: 0010:kobject_get+0x21/0x32
> > <4>[21100.397220] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
> > <4>[21100.397223] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX:
> > 0000000000000002
> > <4>[21100.397225] RDX: 0000000000000003 RSI: 0000000000000003 RDI:
> > 00000000ffffffff
> > <4>[21100.397227] RBP: ffff8d6af3dc9800 R08: 0000baada7db872a R09:
> > ffff8d69a1bc5cd8
> > <4>[21100.397228] R10: ffffa6c6cd9d3ce8 R11: ffffffffa7264f7d R12:
> > ffff8d6af50099a0
> > <4>[21100.397230] R13: ffffffffa57dfb43 R14: ffff8d6af3dc8060 R15:
> > 0000000000000000
> > <4>[21100.397232] FS: 00007efef9e42500(0000) GS:ffff8d6afd800000(0000)
> > knlGS:0000000000000000
> > <4>[21100.397233] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[21100.397235] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4:
> > 00000000003606e0
> > <4>[21100.397237] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > <4>[21100.397238] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > <4>[21100.397240] Call Trace:
> > <4>[21100.397246] get_device+0x16/0x1b
> > <4>[21100.397249] device_shutdown+0x48/0x1a3
> > <4>[21100.397256] kernel_restart+0xe/0x4d
> > <4>[21100.397259] __do_sys_reboot+0x168/0x1c5
> > <4>[21100.397264] ? sched_clock_cpu+0x10/0xb4
> > <4>[21100.397266] ? sched_clock_cpu+0x10/0xb4
> > <4>[21100.397270] ? cycles_2_ns+0x55/0x75
> > <4>[21100.397276] ? task_work_run+0x63/0x8a
> > <4>[21100.397284] ? _raw_spin_unlock_irq+0x2f/0x41
> > <4>[21100.397287] ? task_work_run+0x63/0x8a
> > <4>[21100.397292] do_syscall_64+0x5e/0x6c
> > <4>[21100.397295] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Here's the full callstack, but yeah, it's not very obvious as to what
device is having the problem, which isn't good. I don't know what to
suggest here.

Does 'git bisect' help out to narrow down the problem?

thanks,

greg k-h