Re: wishful thinking about atomic, multi-sector or full MD stripewidth, writes in storage

From: Pavel Machek
Date: Mon Sep 07 2009 - 07:45:49 EST


Hi!

> Note that even without MD raid, the file system issues IO's in file
> system block size (4096 bytes normally) and most commodity storage
> devices use a 512 byte sector size which means that we have to update 8
> 512b sectors.
>
> Drives can (and do) have multiple platters and surfaces and it is
> perfectly normal to have contiguous logical ranges of sectors map to
> non-contiguous sectors physically. Imagine a 4KB write stripe that
> straddles two adjacent tracks on one platter (requiring a seek) or mapped
> across two surfaces (requiring a head switch). Also, a remapped sector
> can require more or less a full surface seek from where ever you are to
> the remapped sector area of the drive.

Yes, but ext3 was designed to handle the partial write (according to
tytso).

> These are all examples that can after a power loss, even a local
> (non-MD) device, do a partial update of that 4KB write range of
> sectors.

Yes, but ext3 journal protects metadata integrity in that case.

> In other words, this is not just an MD issue, it is entirely possible
> even with non-MD devices.
>
> Also, when you enable the write cache (MD or not) you are buffering
> multiple MB's of data that can go away on power loss. Far greater (10x)
> the exposure that the partial RAID rewrite case worries about.

Yes, that's what barriers are for. Except that they are not there on
MD0/MD5/MD6. They actually work on local sata drives...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/