Re: Linux 2.6.29

From: Ric Wheeler
Date: Mon Mar 30 2009 - 07:22:38 EST


Andreas T.Auer wrote:
On 30.03.2009 11:05 Alan Cox wrote:
It seems you still didn't get the point. ext3 data=ordered is not the
problem. The problem is that the average developer doesn't expect the fs
to _re-order_ stuff. This is how most common fs did work long before
No it isn´t. Standard Unix file systems made no such guarantee and would
write out data out of order. The disk scheduler would then further
re-order things.

You surely know that better: Did fs actually write "later" data quite
long before "earlier" data? During the flush data may be re-ordered, but
was it also _done_ outside of it?

People keep forgetting that storage (even on your commodity s-ata class of drives) has very large & volatile cache. The disk firmware can hold writes in that cache as long as it wants, reorder its writes into anything that makes sense and has no explicit ordering promises.

This is where the write barrier code comes in - for file systems that care about ordering for data, we use barrier ops to impose the required ordering.

In a similar way, fsync() gives applications the power to impose their own ordering.

If we assume that we can "save" an fsync cost with ordering mode, we have to keep in mind that the file system will need to do the expensive cache flushes in order to preserve its internal ordering.
If you think the ¨guarantees¨ from before ext3 are normal defaults you´ve
been writing junk code

I'm still on ReiserFS since it was considered stable in some SuSE 7.x.
And I expected it to be fairly ordered, but as a network protocol
programmer I didn't rely on the ordering of fs write-outs yet.

With reiserfs, you will have barriers on by default in SLES/opensuse which will keep (at least fs meta-data) properly ordered....

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/