Re: Oops in 4.0.0-rc6: __destroy_inode

From: Jan Kara
Date: Tue Apr 14 2015 - 06:03:40 EST


On Wed 08-04-15 08:37:00, Peter Hurley wrote:
> [ + Al Viro, linux-fsdevel ]
>
> On 04/08/2015 07:12 AM, Tobias Hoffmann wrote:
> > Hi,
> >
> > after updating from 3.19.0-rc4 to 4.0.0-rc6 I've experienced the appended two similar oopses.
> > In both cases they occurred without obvious cause after less than 2 days uptime, and caused Xorg to hang - requiring a manual reboot (init 6 via ssh did not run to completion).
> > The only other thing I updated was userspace libdrm + xorg-video-nouveau, but that should not cause oopses, right?
> >
> > With 3.19.0-rc4 I had uptime > 40 days -- and then a general protection fault at __d_lookup (also appended) which seems unrelated to the __destroy_inode oopses.
> > I'm now back at 3.19.
> >
> > Tobias
> >
> > PS: please CC.
> >
> > ---
> > BUG: unable to handle kernel paging request at ffffffffff3cffff
> > IP: [<ffffffff8115a1c7>] __destroy_inode+0x77/0xd0
> > PGD 16b8067 PUD 16ba067 PMD 17f0067 PTE 0
> > Oops: 0002 [#1] PREEMPT SMP
> > Modules linked in: snd_hrtimer snd_usb_audio snd_usbmidi_lib ipt_REJECT nf_reject_ipv4 iptable_filter xt_REDIRECT nf_nat_redirect xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfsd auth_rpcgss oid_registry exportfs nfs_acl nfs lockd grace sunrpc ppdev lp snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_mpu401 snd_seq_dummy snd_mpu401_uart snd_seq_oss snd_seq_midi snd_rawmidi nouveau wmi video ttm drm_kms_helper drm snd_seq_midi_event snd_seq cfbfillrect cfbimgblt snd_seq_device snd_timer cfbcopyarea evdev snd psmouse i2c_algo_bit parport_pc soundcore ns558 button parport i2c_nforce2 gameport acpi_cpufreq
> > CPU: 1 PID: 472 Comm: kswapd0 Not tainted 4.0.0-rc6-00188-gf8b3d8a-dirty #32
> > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./ALiveNF5-eSATA2+., BIOS P2.10 04/09/2008
> > task: ffff8801aa0f3250 ti: ffff8801aaa64000 task.ti: ffff8801aaa64000
> > RIP: 0010:[<ffffffff8115a1c7>] [<ffffffff8115a1c7>] __destroy_inode+0x77/0xd0
> > RSP: 0000:ffff8801aaa67bd8 EFLAGS: 00210286
> > RAX: ffffffffff3cfffe RBX: ffff88010238d978 RCX: 00000000000024c0
> > RDX: 0000000000000001 RSI: ffff88010238da08 RDI: ffffffffff3cffff
> > RBP: ffff8801aaa67be8 R08: ffffffff8115b3d0 R09: ffff8801aaa67d40
> > R10: 0000000000000400 R11: 0000000000000000 R12: ffff88010238d9f8
> > R13: ffffffff815210e0 R14: 0000000000000000 R15: 00000000000000a9
> > FS: 0000000000000000(0000) GS:ffff8801b1c80000(0000) knlGS:00000000f1604b40
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffffffffff3cffff CR3: 00000000c0e95000 CR4: 00000000000006e0
> > Stack:
> > 0000000000000003 ffff88010238d978 ffff8801aaa67c08 ffffffff8115a7d1
> > ffff88010238d978 ffff88010238d978 ffff8801aaa67c38 ffffffff8115a922
> > ffff8801aaa67c38 ffff8801aaa67c78 ffff8800cda4e800 ffff8800cda4eb40
> > Call Trace:
> > [<ffffffff8115a7d1>] destroy_inode+0x21/0x60
> > [<ffffffff8115a922>] evict+0x112/0x180
> > [<ffffffff8115a9c9>] dispose_list+0x39/0x50
> > [<ffffffff8115b825>] prune_icache_sb+0x45/0x50
> > [<ffffffff811447e3>] super_cache_scan+0x153/0x1a0
> > [<ffffffff811105a3>] shrink_slab.part.55.constprop.60+0x1a3/0x250
> > [<ffffffff811129c1>] shrink_zone+0xa1/0xb0
> > [<ffffffff81112dbf>] kswapd+0x3ef/0x700
> > [<ffffffff811129d0>] ? shrink_zone+0xb0/0xb0
> > [<ffffffff810aaf04>] kthread+0xc4/0xe0
> > [<ffffffff810aae40>] ? kthread_freezable_should_stop+0x60/0x60
> > [<ffffffff814f6588>] ret_from_fork+0x58/0x90
> > [<ffffffff810aae40>] ? kthread_freezable_should_stop+0x60/0x60
> > Code: 48 8b 7b 10 48 8d 47 ff 48 83 f8 fd 77 0a 48 85 ff 74 05 f0 ff 0f 74 3c 48 8b 7b 18 48 8d 47 ff 48 83 f8 fd 77 0a 48 85 ff 74 05 <f0> ff 0f 74 14 65 48 ff 0d c4 3d eb 7e 48 83 c4 08 5b 5d c3 0f
> > RIP [<ffffffff8115a1c7>] __destroy_inode+0x77/0xd0
> > RSP <ffff8801aaa67bd8>
> > CR2: ffffffffff3cffff
So we are very likely oopsing on atomic_dec_and_test() in
posix_acl_release() called at inode->i_default_acl. Value of i_default_acl
is in RDI - ffffffffff3cffff - looks very much like corrupted value of
ACL_NOT_CACHED which is -1. So likely a random memory corruption where
someone wrote 0x3c into your inode. Very likely a kernel bug but impossible
to debug without more info...

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/