Re: Linux 2.6.29

From: Ric Wheeler
Date: Wed Mar 25 2009 - 15:53:35 EST


Jens Axboe wrote:
On Wed, Mar 25 2009, Jeff Garzik wrote:
Jens Axboe wrote:
On Tue, Mar 24 2009, Jeff Garzik wrote:
Linus Torvalds wrote:
But I really don't understand filesystem people who think that "fsck" is the important part, regardless of whether the data is valid or not. That's just stupid and _obviously_ bogus.
I think I can understand that point of view, at least:

More customers complain about hours-long fsck times than they do about silent data corruption of non-fsync'd files.


The point is, if you write your metadata earlier (say, every 5 sec) and the real data later (say, every 30 sec), you're actually MORE LIKELY to see corrupt files than if you try to write them together.

And if you write your data _first_, you're never going to see corruption at all.
Amen.

And, personal filesystem pet peeve: please encourage proper FLUSH CACHE use to give users the data guarantees they deserve. Linux's sync(2) and fsync(2) (and fdatasync, etc.) should poke the block layer to guarantee a media write.
fsync already does that, at least if you have barriers enabled on your
drive.
Erm, no, you don't enable barriers on your drive, they are not a hardware feature. You enable barriers via your filesystem.

Thanks for the lesson Jeff, I'm obviously not aware how that stuff
works...

Stating "fsync already does that" borders on false, because that assumes
(a) the user has a fs that supports barriers
(b) the user is actually aware of a 'barriers' mount option and what it means
(c) the user has turned on an option normally defaulted to off.

Or in other words, it pretty much never happens.

That is true, except if you use xfs/ext4. And this discussion is fine,
as was the one a few months back that got ext4 to enable barriers by
default. If I had submitted patches to do that back in 2001/2 when the
barrier stuff was written, I would have been shot for introducing such a
slow down. After people found out that it just wasn't something silly,
then you have a way to enable it.

I'd still wager that most people would rather have a 'good enough
fsync' on their desktops than incur the penalty of barriers or write
through caching. I know I do.

Furthermore, a blatantly obvious place to flush data to media -- fsync(2), fdatasync(2) and sync_file_range(2) -- should cause the block layer to issue a FLUSH CACHE for __any__ filesystem. But that doesn't happen either.

So, no, for 95% of Linux users, fsync does _not_ already do that. If you are lucky enough to use XFS or ext4, you're covered. That's it.

The point is that you need to expose this choice somewhere, and that
'somewhere' isn't manually editing fstab and enabling barriers or
fsync-for-real. And it should be easier.

Another problem is that FLUSH_CACHE sucks. Really. And not just on
ext3/ordered, generally. Write a 50 byte file, fsync, flush cache and
wit for the world to finish. Pretty hard to teach people to use a nicer
fdatasync(), when the majority of the cost now becomes flushing the
cache of that 1TB drive you happen to have 8 partitions on. Good luck
with that.

And, as I am sure that you do know, to add insult to injury, FLUSH_CACHE is per device (not file system).

When you issue an fsync() on a disk with multiple partitions, you will flush the data for all of its partitions from the write cache....

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/