Re: Recurring Oops in link_path_walk()

From: christophe leroy
Date: Sat Nov 21 2015 - 05:37:53 EST




Le 20/11/2015 22:17, Al Viro a écrit :
On Fri, Nov 20, 2015 at 12:58:40PM -0600, Scott Wood wrote:

Looks like garbage in dentry->d_inode, assuming that reconstruction of
the mapping of line numbers to addresses is correct... Not sure it is,
though; what's more, just how does LR manage to point to the insn right
after the call of dput(), of all things?
When "bl dput" is executed, LR gets set to the instruction after the bl.
After dput returns, LR still has that value. Presumably the call to mntput
was skipped via the beq. Nothing else modifies LR between the dput return and
the faulting address.
OK, AFAICS it's this:
604) do {
605) struct path link = *path;
606) void *cookie;
607)
608) res = follow_link(&link, nd, &cookie);
609) if (res)
610) break;
611) res = walk_component(nd, path, LOOKUP_FOLLOW);
612) put_link(nd, &link, cookie);
and we are seeing assorted garbage as link.dentry->d_inode at put_link()
call. What's really interesting, follow_link() has return 0, which means
that it must have passed through
849) *p = dentry->d_inode->i_op->follow_link(dentry, nd);
with
825) struct dentry *dentry = link->dentry;
upstream of that and link as seen by follow_link() is &link as seen by
caller (nested_symlink()); IOW, at that point link.dentry->d_inode used to
be a valid pointer.

Do you have something resembling a reproducer or a chance to get a crash
dump at that point?


Unfortunately no, I got no way to reproduce it, it happens very seldom.
Not sure what kind of crash dump I could get when it happens.

Maybe I can try to add delais/scheduling between follow_link() and put_link() to see if it happens more often ?


Also got a few other Oops at different functions but even more seldom than this one, not sure it has any link with that one, but I put them below just in case. Maybe they are worth being investigated as well, in that case I could also provide function disassembly for them:



[46796.501487] Unable to handle kernel paging request for data at address 0x000002dd
[46796.514365] Faulting instruction address: 0xc00c5978
[46796.524217] Oops: Kernel access of bad area, sig: 11 [#1]
[46796.529351] PREEMPT CMPC885
[46796.532144] CPU: 0 PID: 1107 Comm: snmpd Not tainted 3.18.14 #43
[46796.539790] task: c682d340 ti: c6728000 task.ti: c6728000
[46796.545119] NIP: c00c5978 LR: c00c5974 CTR: c00efeb4
[46796.550033] REGS: c6729e00 TRAP: 0300 Not tainted (3.18.14)
[46796.557497] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24042424 XER: 20000000
[46796.564043] DAR: 000002dd DSISR: c0000000
[46796.564043] GPR00: c00c5974 c6729eb0 c682d340 00000000 c5a02734 00000003 00000000 00851d4a
[46796.564043] GPR08: 000005ae 000002b9 00009032 000001e4 24042424 1001c8cc 7fc835f8 100ad378
[46796.564043] GPR16: 00000000 7fc835f0 7fc835e8 7fc835e0 7fc835d8 7fc835d0 7fc835c8 7fc835c0
[46796.564043] GPR24: 0fe59f14 000002ac c6a44b48 c6056110 c5e03168 c5a026e0 c6728000 c1a026e0
[46796.596017] NIP [c00c5978] destroy_inode+0x38/0x84
[46796.600736] LR [c00c5974] destroy_inode+0x34/0x84
[46796.605344] Call Trace:
[46796.607793] [c6729eb0] [c00c5974] destroy_inode+0x34/0x84 (unreliable)
[46796.614271] [c6729ec0] [c00c1d90] __dentry_kill+0x2a8/0x304
[46796.619763] [c6729ee0] [c00c27c8] dput+0xd0/0x1d8
[46796.624416] [c6729f00] [c00adf54] __fput+0x134/0x1fc
[46796.629319] [c6729f20] [c002de28] task_work_run+0xac/0xf4
[46796.634655] [c6729f40] [c000bba4] do_user_signal+0x74/0xc4
[46796.640023] Instruction dump:
[46796.642955] 39430078 93e1000c 90010014 7c7f1b78 81230078 7d295278 7d290034 5529d97e
[46796.650612] 69290001 0f090000 4bffff45 813f0014 <81290024> 81290004 2f890000 419e0020

Here it is inode->i_sb which seems wrong.

c00c5940 <destroy_inode>:
struct inode *inode = container_of(head, struct inode, i_rcu);
kmem_cache_free(inode_cachep, inode);
}

static void destroy_inode(struct inode *inode)
{
c00c5940: 7c 08 02 a6 mflr r0
c00c5944: 94 21 ff f0 stwu r1,-16(r1)
BUG_ON(!list_empty(&inode->i_lru));
c00c5948: 39 43 00 78 addi r10,r3,120
struct inode *inode = container_of(head, struct inode, i_rcu);
kmem_cache_free(inode_cachep, inode);
}

static void destroy_inode(struct inode *inode)
{
c00c594c: 93 e1 00 0c stw r31,12(r1)
c00c5950: 90 01 00 14 stw r0,20(r1)
c00c5954: 7c 7f 1b 78 mr r31,r3
BUG_ON(!list_empty(&inode->i_lru));
c00c5958: 81 23 00 78 lwz r9,120(r3)
c00c595c: 7d 29 52 78 xor r9,r9,r10
c00c5960: 7d 29 00 34 cntlzw r9,r9
c00c5964: 55 29 d9 7e rlwinm r9,r9,27,5,31
c00c5968: 69 29 00 01 xori r9,r9,1
c00c596c: 0f 09 00 00 twnei r9,0
__destroy_inode(inode);
c00c5970: 4b ff ff 45 bl c00c58b4 <__destroy_inode>
if (inode->i_sb->s_op->destroy_inode)
c00c5974: 81 3f 00 14 lwz r9,20(r31)
==> c00c5978: 81 29 00 24 lwz r9,36(r9)
c00c597c: 81 29 00 04 lwz r9,4(r9)
c00c5980: 2f 89 00 00 cmpwi cr7,r9,0
c00c5984: 41 9e 00 20 beq cr7,c00c59a4 <destroy_inode+0x64>
inode->i_sb->s_op->destroy_inode(inode);
else
call_rcu(&inode->i_rcu, i_callback);
}
c00c5988: 80 01 00 14 lwz r0,20(r1)


[32878.259271] Unable to handle kernel paging request for data at address 0xf030f0f4
[32878.266488] Faulting instruction address: 0xc00b65ec
[32878.271404] Oops: Kernel access of bad area, sig: 11 [#1]
[32878.276712] PREEMPT CMPC885
[32878.279510] CPU: 0 PID: 1391 Comm: snmpd Not tainted 3.18.14 #43
[32878.287157] task: c6812b50 ti: c6c2a000 task.ti: c6c2a000
[32878.292482] NIP: c00b65ec LR: c00b65c8 CTR: 00000000
[32878.297395] REGS: c6c2bd40 TRAP: 0300 Not tainted (3.18.14)
[32878.304860] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 22042422 XER: 00000000
[32878.311408] DAR: f030f0f4 DSISR: c0000000
[32878.311408] GPR00: c00b9bb8 c6c2bdf0 c6812b50 ffffff9c c6478010 00000051 f0e1f0f0 f030f0f0
[32878.311408] GPR08: f0f8f0f0 c2c05380 f030f0f0 00000220 42042422 1001c8cc 7fffffff 0ffedab0
[32878.311408] GPR16: 3f800000 1001c314 559b51dc 7fca8508 1001bcb0 00000000 7fca84f8 1001be28
[32878.311408] GPR24: 0fe8c008 1001be28 00000041 c6478000 c6c2bf08 ffffff9c c6c2be88 c6c2be88
[32878.343378] NIP [c00b65ec] path_init+0x25c/0x488
[32878.347929] LR [c00b65c8] path_init+0x238/0x488
[32878.352365] Call Trace:
[32878.354798] [c6c2bdf0] [c0531500] 0xc0531500 (unreliable)
[32878.360158] [c6c2be20] [c00b9bb8] path_openat+0x74/0x678
[32878.365402] [c6c2be80] [c00ba1ec] do_filp_open+0x30/0x8c
[32878.370657] [c6c2bf00] [c00ab9ac] do_sys_open+0x14c/0x238
[32878.375997] [c6c2bf40] [c000b27c] ret_from_syscall+0x0/0x38
[32878.381449] Instruction dump:
[32878.384379] 70a70040 41820114 4bf90a81 812203f0 81090004 710a0001 40820240 81490014
[32878.392039] 80c90010 915f001c 90df0018 7d475378 <814a0004> 71460001 40820210 80e90004

[122726.996005] Unable to handle kernel paging request for data at address 0xf0f0f0f4
[122727.003271] Faulting instruction address: 0xc00b65ec
[122727.008271] Oops: Kernel access of bad area, sig: 11 [#1]
[122727.013667] PREEMPT CMPC885
[122727.016550] CPU: 0 PID: 567 Comm: snmpd Not tainted 3.18.14 #43
[122727.024196] task: c63bb9c0 ti: c647e000 task.ti: c647e000
[122727.029608] NIP: c00b65ec LR: c00b65c8 CTR: 00000000
[122727.034607] REGS: c647fd40 TRAP: 0300 Not tainted (3.18.14)
[122727.042159] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24222422 XER: 00000000
[122727.048793] DAR: f0f0f0f4 DSISR: c0000000
[122727.048793] GPR00: c00b9bb8 c647fdf0 c63bb9c0 ffffff9c c6432010 00000051 f0f0f0f0 f0f0f0f0
[122727.048793] GPR08: f0f0f0f0 c2501040 f0f0f0f0 000000da 44222422 1001c8cc 00000000 0000000a
[122727.048793] GPR16: 10151c70 7f84fab1 7f84fbe8 7f84ff40 7f84faa8 00000000 10127b90 7f84fbf0
[122727.048793] GPR24: 0ff681f8 1014a590 00000041 c6432000 c647ff08 ffffff9c c647fe88 c647fe88
[122727.080850] NIP [c00b65ec] path_init+0x25c/0x488
[122727.085486] LR [c00b65c8] path_init+0x238/0x488
[122727.090008] Call Trace:
[122727.092528] [c647fdf0] [c0531500] 0xc0531500 (unreliable)
[122727.097974] [c647fe20] [c00b9bb8] path_openat+0x74/0x678
[122727.103304] [c647fe80] [c00ba1ec] do_filp_open+0x30/0x8c
[122727.108642] [c647ff00] [c00ab9ac] do_sys_open+0x14c/0x238
[122727.114070] [c647ff40] [c000b27c] ret_from_syscall+0x0/0x38
[122727.119609] Instruction dump:
[122727.122625] 70a70040 41820114 4bf90a81 812203f0 81090004 710a0001 40820240 81490014
[122727.130370] 80c90010 915f001c 90df0018 7d475378 <814a0004> 71460001 40820210 80e90004


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/