Re: general protection fault on 3.15.6

From: Tejun Heo
Date: Mon Jul 21 2014 - 09:29:46 EST


Hello, Steven.

On Sun, Jul 20, 2014 at 09:27:42PM -0700, Steven Noonan wrote:
> My router/storage box suddenly stopped responding (originally noticed
> because dnsmasq wasn't responding) and I had to reboot it. I checked
> the systemd journal when it came back and these were the last thing in
> there for the previous boot. Any ideas about pinning down the cause?
>
> general protection fault: 0000 [#1] SMP
...
> CPU: 3 PID: 8881 Comm: systemd Tainted: P WC O 3.15.6 #1
> Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
> task: ffff8802f473d880 ti: ffff8802f0abc000 task.ti: ffff8802f0abc000
> RIP: 0010:[<ffffffff811ad226>] [<ffffffff811ad226>]
> __kmalloc_track_caller+0x86/0x260

So, GFP in kmalloc,

> Call Trace:
> [<ffffffff8116fb11>] kstrdup+0x31/0x60

called from kstrdup()

> [<ffffffff8123a4f4>] __kernfs_new_node+0x34/0xf0
> [<ffffffff8123b386>] kernfs_new_node+0x26/0x50

which was invoked to copy the node name while creating a new kernfs
node.

> [<ffffffff8123cc59>] __kernfs_create_file+0x39/0xa0
> [<ffffffff810edb60>] cgroup_addrm_files+0x110/0x250
> [<ffffffff810ee9ab>] cgroup_mkdir+0x21b/0x540
> [<ffffffff8125ca36>] ? security_inode_notifysecctx+0x16/0x20
> [<ffffffff8123b30a>] kernfs_iop_mkdir+0x5a/0x90
> [<ffffffff811d3120>] vfs_mkdir+0xe0/0x180
> [<ffffffff811d7bea>] SyS_mkdirat+0xaa/0xe0
> [<ffffffff811d7c39>] SyS_mkdir+0x19/0x20
> [<ffffffff8151496d>] system_call_fastpath+0x1a/0x1f
> Code: 25 88 dd 00 00 49 8b 50 08 4d 8b 20 4d 85 e4 0f 84 50 01 00 00
> 49 83 78 10 00 0f 84 45 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49>
> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63
> RIP [<ffffffff811ad226>] __kmalloc_track_caller+0x86/0x260
> RSP <ffff8802f0abfc88>

followed by another GPF

> general protection fault: 0000 [#2] SMP
...
> RIP: 0010:[<ffffffff811aa26a>] [<ffffffff811aa26a>] __kmalloc+0x8a/0x280

in __kmalloc()

> [<ffffffff8132d81f>] acpi_ns_internalize_name+0x68/0xad

called from acpi to copy a different name.

I don't think the problem is anything cgroup / kernfs specific. The
allocator is GPFing inside it from multiple callers and it's not even
using a caller-provided cache. It looks like something screwed up the
memory allocator and it's now faulting on unrelated callers. Most
likely illegal free or use-after-free.

Steven, can you please post the full kernel log from boot till reboot?
It usually is a good idea to include full log when reporting bugs as
it's very easy to exclude the actually relevant part.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/