[PATCH] VFS: Add back check for !inode in walk_component()

From: Steven Rostedt
Date: Thu May 07 2015 - 12:52:52 EST



Commit 698934df8b45 "VFS: Combine inode checks with d_is_negative() and
d_is_positive() in pathwalk" removed a check for inode being NULL in
walk_component() where the type is tested. Stressing my tracefs create
and remove instances while reading the files now triggers this:

BUG: unable to handle kernel NULL pointer dereference at 0000001c
IP: [<c05383a6>] inode_permission+0x2d/0x6c
*pdpt = 0000000030d9a001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 microcode ppdev parport_pc parport r8169
CPU: 0 PID: 3201 Comm: ftrace-test-mki Not tainted 4.1.0-rc2-test+ #94
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
task: efa2ac20 ti: ed7a8000 task.ti: ed7a8000
EIP: 0060:[<c05383a6>] EFLAGS: 00010282 CPU: 0
EIP is at inode_permission+0x2d/0x6c
EAX: 00000001 EBX: 00000000 ECX: 00000006 EDX: 00000081
ESI: 00000000 EDI: ef92201b EBP: ed7a9e0c ESP: ed7a9df8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0000001c CR3: 31cb4c20 CR4: 001407f0
Stack:
c0538379 c101c228 00000000 00000081 ed7a9ef8 ed7a9e64 c0538c7e c0538c33
c1012e98 00000000 ed7a9ef8 00000000 00000006 efa2ac20 efa2ac20 e1919d01
00000000 c04767e1 ed7a9e64 c05374fb f1392210 ee3c0240 00000000 c05391b5
Call Trace:
[<c0538379>] ? __inode_permission+0x91/0x91
[<c0538c7e>] link_path_walk+0x7a/0x3db
[<c0538c33>] ? link_path_walk+0x2f/0x3db
[<c04767e1>] ? trace_hardirqs_on+0xb/0xd
[<c05374fb>] ? read_seqcount_begin+0x6a/0x77
[<c05391b5>] ? path_init+0x1d6/0x326
[<c05392f9>] path_init+0x31a/0x326
[<c0538fdf>] ? link_path_walk+0x3db/0x3db
[<c05312a3>] ? get_empty_filp+0x128/0x190
[<c053ad56>] path_openat+0x1a3/0x3da
[<c0408c1a>] ? native_sched_clock+0x46/0x4b
[<c053bcdf>] do_filp_open+0x2e/0x6f
[<c052f591>] do_sys_open+0x7c/0x108
[<c052f558>] ? do_sys_open+0x43/0x108
[<c0cc4f87>] ? sysenter_exit+0xf/0x16
[<c052f63d>] SyS_open+0x20/0x22
[<c0cc4f58>] sysenter_do_call+0x12/0x12
Code: e5 53 83 ec 10 3e 8d 74 26 00 89 c3 89 44 24 08 a1 d8 7b 25 c1 89 55 f8 c7 04 24 79 83 53 c0 89 44 24 04 e8 0e b0 f9 ff 8b 55 f8 <8b> 4b 1c f6 c2 02 74 2e f6 41 30 01 66 8b 03 74 25 25 00 f0 00
EIP: [<c05383a6>] inode_permission+0x2d/0x6c SS:ESP 0068:ed7a9df8
CR2: 000000000000001c
---[ end trace 54b6a3ccfbef84c6 ]---

Adding a bunch of debug I found that the race is the following:

CPU1 CPU2
---- ----
mkdir(foo)
d_instantiate(dentry, inode);
spin_lock(inode->i_lock);
spin_lock(dentry->d_lock);
__d_set_inode_and_type();
link_path_walk()
walk_component()
lookup_fast(nd, path, &inode);
*inode = path->d_entry->d_inode; (inode == NULL)

dentry->d_inode = inode;
smp_wmb();
dentry->d_flags = flags;

if (d_is_negative(path->d_entry))
[ fails ]

[...]

nd->inode = inode; (where inode is NULL);

Then in the next loop of link_path_walk()

err = may_lookup(nd);
inode_permission(nd->inode...)
reference to nd->inode->i_sb (BOOM!)

Without this patch I can easily cause the bug, with this patch, I have
yet to trigger it.

Cc: David Howells <dhowells@xxxxxxxxxx>
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4a8d998b7274..cdd066680de9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1585,7 +1585,7 @@ static inline int walk_component(struct nameidata *nd, struct path *path,
inode = path->dentry->d_inode;
}
err = -ENOENT;
- if (d_is_negative(path->dentry))
+ if (!inode || d_is_negative(path->dentry))
goto out_path_put;

if (should_follow_link(path->dentry, follow)) {
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/