Re: [PATCH] VFS bugfix: two read_inode() calles withoutclear_inode() call between

From: David Woodhouse
Date: Thu May 05 2005 - 04:12:45 EST


On Wed, 2005-05-04 at 14:58 -0700, Andrew Morton wrote:
> That doesn't really settle the question of whether the callers are broken,
> whether they are doing something which the VFS really should support and
> what need to be done to the VFS to support it properly.

Filesystems exported by NFS _will_ get iget() called for recently-
deleted inodes. JFFS2 and (I believe) NTFS also do internal things which
end up having the same effect.

The premise is simple: regardless of who calls iget() and when they do
it, the VFS should not call the filesystem's read_inode() method twice
consecutively for the same inode without ever calling clear_inode() or
delete_inode() in between.

That's what __wait_on_freeing_inode() was introduced for -- so we can
make sure the clear_inode() call actually happened before we call
read_inode() again for the same inode. Unfortunately there is still a
code path where we can get it wrong, and that's what Artem is fixing.

> Looking at the proposed patch: what happens if an inode is on its way to
> dispose_list() and someone tries to do an iget() on it? I don't think I_LOCK
> is set, so __wait_on_freeing_inode() will no longer provide this guarantee:

> /*
> * If we try to find an inode in the inode hash while it is being deleted, we
> * have to wait until the filesystem completes its deletion before reporting
> * that it isn't found. This is because iget will immediately call
> * ->read_inode, and we want to be sure that evidence of the deletion is found
> * by ->read_inode.

That comment isn't true any more. Look at what __wait_on_freeing_inode()
actually does, and observe the fact that all its callers actually loop
and start again after calling it.

The current implementation of __wait_on_freeing_inode() waits until it
_might_ have changed, not until it _has_ changed. That's why it's OK for
it just to be a yield() or a wait on a bit_waitqueue.

I'm not convinced I _like_ that implementation, mind you -- it's changed
since I last looked at it. But I don't see that there's anything
strictly broken about it.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/