Re: Linux 2.6.29

From: Jeff Garzik
Date: Wed Mar 25 2009 - 15:33:29 EST


Jens Axboe wrote:
On Tue, Mar 24 2009, Jeff Garzik wrote:
Linus Torvalds wrote:
But I really don't understand filesystem people who think that "fsck" is the important part, regardless of whether the data is valid or not. That's just stupid and _obviously_ bogus.
I think I can understand that point of view, at least:

More customers complain about hours-long fsck times than they do about silent data corruption of non-fsync'd files.


The point is, if you write your metadata earlier (say, every 5 sec) and the real data later (say, every 30 sec), you're actually MORE LIKELY to see corrupt files than if you try to write them together.

And if you write your data _first_, you're never going to see corruption at all.
Amen.

And, personal filesystem pet peeve: please encourage proper FLUSH CACHE use to give users the data guarantees they deserve. Linux's sync(2) and fsync(2) (and fdatasync, etc.) should poke the block layer to guarantee a media write.

fsync already does that, at least if you have barriers enabled on your
drive.

Erm, no, you don't enable barriers on your drive, they are not a hardware feature. You enable barriers via your filesystem.

Stating "fsync already does that" borders on false, because that assumes
(a) the user has a fs that supports barriers
(b) the user is actually aware of a 'barriers' mount option and what it means
(c) the user has turned on an option normally defaulted to off.

Or in other words, it pretty much never happens.

Furthermore, a blatantly obvious place to flush data to media -- fsync(2), fdatasync(2) and sync_file_range(2) -- should cause the block layer to issue a FLUSH CACHE for __any__ filesystem. But that doesn't happen either.

So, no, for 95% of Linux users, fsync does _not_ already do that. If you are lucky enough to use XFS or ext4, you're covered. That's it.

Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/