Re: True fsync() in Linux (on IDE)

From: Jens Axboe
Date: Mon Mar 22 2004 - 08:24:03 EST


On Mon, Mar 22 2004, Heikki Tuuri wrote:
> 2. An 'atomic' file write in the OS does not solve the problem of partially
> written database pages in a power outage if the disk drive is not guaranteed
> to stay operational long enough to be able to write the whole page
> physically to disk. An InnoDB data page is 16 kB, and probably not
> guaranteed to be any 'atomic' unit of physical disk writes. However, in
> practice, half-written pages (either because of the OS or the disk) seem to
> be very rare.

There's no such thing as atomic writes bigger than a sector really, we
just pretend there is. Timing usually makes this true.

For bigger atomic writes, 2.4 SUSE kernel had some nasty hack (called
blk-atomic) to prevent reordering by the io scheduler to avoid partial
blocks from databases. For 2.6 it's much easier since you can just send
a 16kb write out as one unit. So you send out the db data page as 1
hardware request, which is pretty much as atomic as you can do it from
the OS point of view. As long as you avoid a big window between parts of
this data page, you've narrowed the window of breakage from many seconds
to miliseconds (maybe even 0, if drive platter inertia guarentees you
that the write will ensure the data is on platter).

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/