Re: [RFC PATCH] fpathconf() for fsync() behavior

From: Andrew Morton
Date: Thu Apr 23 2009 - 01:23:00 EST


On Wed, 22 Apr 2009 20:12:57 -0400 Valerie Aurora Henson <vaurora@xxxxxxxxxx> wrote:

> In the default mode for ext3 and btrfs, fsync() is both slow and
> unnecessary for some important application use cases - at the same
> time that it is absolutely required for correctness for other modes of
> ext3, ext4, XFS, etc. If applications could easilyl distinguish
> between the two cases, they would be more likely to be correct and
> fast.
>
> How about an fpathconf() variable, something like _PC_ORDERED? E.g.:
>
> /* Unoptimized example optional fsync() demo */
> write(fd);
> /* Only fsync() if we need it */
> if (fpath_conf(fd, _PC_ORDERED) != 1)
> fsync(fd);
> rename(tmp_path, new_path);
>
> I know of two specific real-world cases in which this would
> significantly improve performance: (a) fsync() before rename(), (b)
> fsync() of the parent directory of a newly created file. Case (b) is
> particularly nasty when you have multiple threads creating files in
> the same directory because the dir's i_mutex is held across fsync() -
> file creates become limited to the speed of sequential fsync()s.
>
> Conceptual libc patch below.

Would it be better to implement new syscall(s) with finer-grained control
and better semantics? Then userspace would just need to to:

fsync_on_steroids(fd, FSYNC_BEFORE_RENAME);

and that all gets down into the filesystem which can then work out what
it needs to do to implement the command.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/