Re: [patch 7/8] fs: fix or note I_DIRTY handling bugs infilesystems

From: Nick Piggin
Date: Tue Jan 04 2011 - 02:52:58 EST


On Tue, Jan 04, 2011 at 01:39:44AM -0500, Christoph Hellwig wrote:
> On Tue, Jan 04, 2011 at 05:04:52PM +1100, Nick Piggin wrote:
> > However I strongly believe that filesystems should be able to access
> > and manipulate the inode dirty state directly. If you agree with that,
> > then I think they should be able to access the lock required for that.
> > Filesystems will want to keep their internal state in synch with vfs
> > visible state most likely (eg. like your hfsplus patches), and _every_
> > time we do "loose" coupling between state bits like this (eg. page and
> > buffer state; page and pte state; etc), it turns out to be a huge mess
> > of races and subtle code and ordering.
>
> I've probably done the two most complicated fsync implementations in xfs
> and hfsplys myself, and I'd really prefer the interface to be as simple
> as possible.

Agree and as I said I'll change inode_writeback_begin/end to do the
locking and just return the dirty bit mask etc.


> The way the I_DIRTY_* flags and the datasync parameter to
> ->fsync interact are almost a receipe for getting it wrong, which in
> fact most implementations that tried to be smart did. See gfs2 and
> ocfs2 comments in this threads for classic examples.

Right if you want a helper to get the correct mask of bits required
that's fine and I agree, but locking is a different issue too: if
filesystems are trying to keep private state in sync with vfs state,
then they _need_ to do it properly with the proper locking. I think
your hfsplus implementation had a bug or two in this area didn't it?
(although I ended up getting side tracked with all these bugs half
way through looking at that).


> If we actually get filesystems that need to do smarts in
> checking/clearing the I_DIRTY_* flag we can discuss proper interfaces
> for it - duplicating guts of i_state manipulations sounds like a
> relatively bad idea for that.

I disagree, but we'll explore it further later.

It is a couple of lines to check and clear dirty bits. Not rocket
science and I think it is far better to make it explicit what is
happening to the filesystem. The disconnect between what the vfs is
doing and what the filesystems throught should be happening is what
caused all these bugs to start with.

Anyway, long story short, I'll drop the inode_lock export and move the
locking and manipulation into inode_writeback_begin/end for now.

Thanks,
Nick

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/