Re: writing file to disk: not as easy as it looks

From: Theodore Tso
Date: Tue Dec 02 2008 - 09:04:58 EST


On Tue, Dec 02, 2008 at 10:40:59AM +0100, Pavel Machek wrote:
> Actually, it looks like POSIX file interface is on the lowest step of
> Rusty's scale: one that is impossible to use correctly. Yes, it seems
> impossible to reliably&safely write file to disk under Linux. Double
> plus uncool.
>
> So... how to write file to disk and wait for it to reach the stable
> storage, with proper error handling?

Are you trying to do this in C or shell? There is no "fsync" shell
command as far as I know, which is what is confusing me. And whether
"> file" checks for errors or not obviously depends on the application
which is writing to stdout. Some might check for errors, some might
not....

Why do you feel the need to error check "fsync ../.." and "fsync
../../..", et. al?

I can understand why you might want to fsync the containing directory
to make sure the directory entry got written to disk --- but if you're
that paranoid, many modern filesystems use some kind of tree structure
for the directory, and there is always the chance that a second later,
in a b-tree node split, due to a disk error the directory entry gets
lost.

What exactly are your requirements here, and what are you trying to
do? What are you worried about? Most MTA's are quite happy settling
with an fsync() to make sure the data made it to the disk safely and
the super-paranoid might also keep an open fd on the spool directory
and fsync that too. That's been enough for most POSIX programs.

More generally, if you have a higher need for making sure, most system
administrators will spend effort robustifying the storage layer (i.e.,
RAID, battery-backed journals, etc.) rather than obsession over some
API that can tell an application --- "you know that file you just
finished writing 50 milliseconds ago? Well, another application
created 100 files, which forced a b-tree node split, and
golly-gee-willickers, when I tried to modify the directory to
accomodate the node split, we ended up losing 50 directory entries,
including that file you just finished writing, fsyncing, and
closing...."

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/