Re: [PATCH v3 1/1] vfs: iversion truncate bug fix

From: J. Bruce Fields
Date: Thu Jan 05 2012 - 16:36:39 EST


On Thu, Jan 05, 2012 at 01:14:01PM -0500, Christoph Hellwig wrote:
> On Thu, Jan 05, 2012 at 03:53:54PM +1100, Dave Chinner wrote:
> > That's seems like a rather unreliable way of detecting that a file
> > has changed to me. I mean, only ext4 uses inode_inc_version() when
> > it internally dirties an inode, and only ext4 sets the MS_I_VERSION
> > so that inode_inc_version is only called for ext4 inodes when
> > timestamps change.
>
> And even ext4 only does it when using the non-default "i_version"
> mount option.
>
> > Hence just adding an increment to the truncate case doesn't seem to
> > be sufficient to me. e.g. what about the equivalent case of having a
> > hole punched in the file via fallocate? The file has definitely
> > changed, but i_version won't change....
> >
> > Perhaps bumping i_version in __mark_inode_dirty() might be the best
> > way to capture all changes (other than timestamp updates) to any
> > inode regardless of the filesystem type?
>
> It has the same problem as the timestamp updates doing that right now -
> the fs can't do locking around it, and it can't return errors. That's
> something affecting at least btrfs, xfs and IIRC ubifs, and probably
> the cluster filesystems as well. The right answer is to replace the
> timestmap updates which are the only places doing that with a method
> as Josef had planned to do, and then we can include the i_version
> updates in there.
>
> That assumes we figure out a coherent way to do it - except for the
> conditional abuse in file_updates_times it's currently entirely under
> fs control. So the best way to fix it would be to:
>
> - move the fs-private use into those filesystems actually using it.
> Note that a lot less actually check for it rather than just updating
> it based on some cargo cult, and most only do so for directories.
> - figure a why what exact change count semantics NFS (and IMA) want

For NFS:

The NFSv4 protocol has a "change attribute" that a client should be able
to use determine whether its cache is up to date. It therefore doesn't
have to be a "count" necessarily, as long as it changes every time.
(And is extremely unlikely to repeat.)

It should be on by default for any exportable filesystem capable of
supporting it.

It has to change on both data and metadata changes. (Like ctime.)

It has to work for both files and directories.

It has to persist on disk--a client should be able to revalidate its
cache by comparing change attributes fetched before and after a server
reboot. (But: I don't know if there's any easy way to avoid the corner
case where unsync'd updates cause i_version to go backwards after
reboot, and then to repeat a previous i_version value after subsequent
updates.)

--b.

> and find a way to implement them so that the fs can tell the callers
> that they don't exist.
>
> Btw, does IMA care about these beeing persistent?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/