OK, I've studied this a lot more and think I've figured it out. I think
calling it a race is incorrect, but I don't know how to succintly describe
the problem otherwise.
Let's look at the first attachment, a test case which shows the problem.
When run against a Solaris NFS server, the readdir()/unlink() loop returns
".", "..", "foo", NULL, and the rmdir fails. I suspect a Linux NFS
server does the same. But when run against a netapp, it returns
".", "..", "foo", ".nfs...", NULL, ie it picks up the sillyrenamed
file immediately, and this allows the rmdir to succeed.
When foo is unlinked, nfs_unlink() does a sillyrename, this puts the
dentry on nfs_delete_queue, and (in the VFS) unhashes it from the dcache.
This causes a problem, because dentry->d_parent->d_inode is now guaranteed
to remain stale. (OK, I'm not really sure about this last part.)
Then readdir() returns the new .nfs file, this creates a NEW dentry
(we just moved the first one to nfs_delete_queue and did not create a
negative dentry) which now has d_count==1 so instead of sillyrename we
just remove it (but note, we actually have this file open). Then rmdir
Now, we have a dentry on nfs_delete_queue which a) has already been
unlinked and b) whose dentry->d_parent DNE and dentry->d_parent->d_inode
DNE. Of course this will cause confusion later!
Note that if a process does a drive by on the .nfs file (another round
of unlinked-but-open) before we unlink it, we would sillyrename it again.
We'd now have two different dentry's on the delete queue for the same
file. (One of them would just leak, I think--possible local DoS?)
Also note that in vfs_unlink(), we do a d_delete() after the
i_op->unlink(), I think this guarantees that no call to sillyrename will
ever be passed a dentry with the DCACHE_NFSFS_RENAMED flag set (it gets
set in sillyrename, but then unhashed; the next time through vfs_unlink()
we have a new dentry with no flags set). The DCACHE_NFSFS_RENAMED bit
looks to be a holdover from 2.2, where the dentry doesn't get unhashed.
I have 3 proposals for fixes, I've implemented 2 of them and patches
are attached against both 2.4.21-rc7 and 2.5.70.
1) Don't unhash the dentry after silly-renaming. In 2.2, each fs is
responsible for doing a d_delete(), in 2.4 it happens in the VFS and
I think it was just an oversight that the 2.4 VFS doesn't consider
sillyrename (considering the code and comments that are cruft).
This preserves the unlinked-but-open semantic, but breaks rmdir. So
it's not a clear winner from a semantics POV. dentry->d_count is
always correct, which sounds like a plus.
The patch to make this work is utterly simple, which is a big plus.
2a) In nfs_unlink(), if not sillyrenaming, look for the file on
nfs_delete_queue. If present, remove it (since it's now extra).
This has the advantage of preserving the 'rm -rf' semantic, but
breaks unlinked-but-open since the parent dir can go away, and
dentry->d_count is not guaranteed to be correct.
I've implemented but not included this.
2b) Since this is really only a problem when the parent dir goes away,
do the same as above but only scan the queue in nfs_rmdir(), and
mark any entries whose d_parent is "us".
I've included this in favor of (2a) because it's simpler and should
give better performance in the common case.
3) In nfs_complete_unlink(), do a d_lookup() before waking the rpc task.
If an entry is found, go head and schedule the rpc. If no entry is
found, or a negative entry is found, just remove from the queue.
I couldn't get this to work.
With the #2 patch, I've taken the liberty of cleaning up stale comments.
How annoying they were while trying to understand the code. :-( I would
have done the same for the #1 patch, but it's so small that I didn't
want to otherwise "pollute" it. If you decide to go with the #1 patch,
at least do look at the #2 patch for comment fixups.
If you need more data to convince yourself of this bug, I'm definitely
able to provide more data, just tell me what you'd like to see. But I
hope I've adequately described it.
This archive was generated by hypermail 2b29 : Sun Jun 15 2003 - 22:00:20 EST