Re: wishful thinking about atomic, multi-sector or full MD stripewidth, writes in storage

From: Theodore Tso
Date: Mon Sep 07 2009 - 09:12:06 EST


On Mon, Sep 07, 2009 at 01:45:34PM +0200, Pavel Machek wrote:
>
> Yes, but ext3 was designed to handle the partial write (according to
> tytso).

I'm not sure what made you think that I said that. In practice things
usually work out, as a conseuqence of the fact that ext3 uses physical
block journaling, but it's not perfect, becase...

> > Also, when you enable the write cache (MD or not) you are buffering
> > multiple MB's of data that can go away on power loss. Far greater (10x)
> > the exposure that the partial RAID rewrite case worries about.
>
> Yes, that's what barriers are for. Except that they are not there on
> MD0/MD5/MD6. They actually work on local sata drives...

Yes, but ext3 does not enable barriers by default (the patch has been
submitted but akpm has balked because he doesn't like the performance
degredation and doesn't believe that Chris Mason's "workload of doom"
is a common case). Note though that it is possible for dirty blocks
to remain in the track buffer for *minutes* without being written to
spinning rust platters without a barrier.

See Chris Mason's report of this phenonmenon here:

http://lkml.org/lkml/2009/3/30/297

Here's Chris Mason "barrier test" which will corrupt ext3 filesystems
50% of the time after a power drop if the filesystem is mounted with
barriers disabled (which is the default; use the mount option
barrier=1 to enable barriers):

http://lkml.indiana.edu/hypermail/linux/kernel/0805.2/1518.html

(Yes, ext4 has barriers enabled by default.)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/