Re: [patch] fs: avoid I_NEW inodes

From: Jan Kara
Date: Tue Mar 10 2009 - 12:03:39 EST


Hi,

On Tue 10-03-09 14:41:06, Nick Piggin wrote:
> On Thu, Mar 05, 2009 at 12:12:26PM +0100, Jan Kara wrote:
> > On Thu 05-03-09 11:16:37, Nick Piggin wrote:
> > > On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> > > > On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > > > > after ~1hour of running. Previously, the new warnings would start immediately
> > > > > and hang would happen in under 5 minutes.
> > > > A quick grep seems to indicate that you've still missed a few cases,
> > > > haven't you? I still see the same problem in
> > > > drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> > > > scanning, and dquot.c:add_dquot_ref() scanning.
> > > > Otherwise the patch looks fine.
> > >
> > > I thought they should be OK; drop_pagecache_sb doesn't play with flags,
> > > invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
> > > won't if writecount is not elevated...
> > Ah, ok, you are probably right.
> >
> > > But maybe that's abit fragile and it would be better policy to always
> > > skip I_NEW in these traverals?
> > Yes, it seems too fragile to me. I'm not saying we have to forbid
> > everything for I_NEW inodes but I think we should set clear simple rules
> > what is protected by I_NEW and then verify that all sites which can come
> > across such inodes obey them.
>
> OK, sorry for the delay, what do you think of the following patch on top
> of the last?
Thanks for the patch. I have a few comments. See below.

> ---
>
> To be on the safe side, it should be less fragile to exclude I_NEW inodes
> from inode list scans by default (unless there is an important reason to
> have them).
>
> Normally they will get excluded (eg. by zero refcount or writecount etc),
> however it is a bit fragile for list walkers to know exactly what parts of
> the inode state is set up and valid to test when in I_NEW. So along these
> lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> checks with them too -- this shouldn't be a problem should it?)
>
> Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
>
> ---
> fs/dquot.c | 6 ++++--
> fs/drop_caches.c | 2 +-
> fs/inode.c | 2 ++
> fs/notify/inotify/inotify.c | 16 ++++++++--------
> 4 files changed, 15 insertions(+), 11 deletions(-)
>
> Index: linux-2.6/fs/dquot.c
> ===================================================================
> --- linux-2.6.orig/fs/dquot.c
> +++ linux-2.6/fs/dquot.c
> @@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
>
> spin_lock(&inode_lock);
> list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> + if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> + continue;
> if (!atomic_read(&inode->i_writecount))
> continue;
> if (!dqinit_needed(inode, type))
> continue;
> - if (inode->i_state & (I_FREEING|I_WILL_FREE))
> - continue;
>
> __iget(inode);
> spin_unlock(&inode_lock);
> @@ -870,6 +870,8 @@ static void remove_dquot_ref(struct supe
>
> spin_lock(&inode_lock);
> list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> + if (inode->i_state & I_NEW)
> + continue;
> if (!IS_NOQUOTA(inode))
> remove_inode_dquot_ref(inode, type, tofree_head);
> }
Hmm, in this scan, we have to scan also I_NEW inodes because they can
already have quota pointers initialized and so we could leave some dangling
quota references if we skipped I_NEW inodes. Nasty. So just add a comment
here like this one here:
/*
* We have to scan also I_NEW inodes because they can already have quota
* pointer initialized. Luckily, we need to touch only quota pointers and
* these have separate locking (dqptr_sem).
*/

> Index: linux-2.6/fs/drop_caches.c
> ===================================================================
> --- linux-2.6.orig/fs/drop_caches.c
> +++ linux-2.6/fs/drop_caches.c
> @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
>
> spin_lock(&inode_lock);
> list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> - if (inode->i_state & (I_FREEING|I_WILL_FREE))
> + if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> continue;
> if (inode->i_mapping->nrpages == 0)
> continue;
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c
> +++ linux-2.6/fs/inode.c
> @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
> if (tmp == head)
> break;
> inode = list_entry(tmp, struct inode, i_sb_list);
> + if (inode->i_state & I_NEW)
> + continue;
If somebody is setting up inodes at this point, we are in serious
trouble I think. So WARN_ON would be more appropriate I think.

> invalidate_inode_buffers(inode);
> if (!atomic_read(&inode->i_count)) {
> list_move(&inode->i_list, dispose);
> Index: linux-2.6/fs/notify/inotify/inotify.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/inotify/inotify.c
> +++ linux-2.6/fs/notify/inotify/inotify.c
> @@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
> struct list_head *watches;
>
> /*
> + * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> + * I_WILL_FREE which is fine because by that point the inode
> + * cannot have any associated watches.
> + */
Update the comment?

> + if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
> + continue;
> +
> + /*
> * If i_count is zero, the inode cannot have any watches and
> * doing an __iget/iput with MS_ACTIVE clear would actually
> * evict all inodes with zero i_count from icache which is
> @@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
> if (!atomic_read(&inode->i_count))
> continue;
>
> - /*
> - * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> - * I_WILL_FREE which is fine because by that point the inode
> - * cannot have any associated watches.
> - */
> - if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
> - continue;
> -
> need_iput_tmp = need_iput;
> need_iput = NULL;
> /* In case inotify_remove_watch_locked() drops a reference. */

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/