Re: True fsync() in Linux (on IDE)

From: Matthias Andree
Date: Mon Mar 22 2004 - 10:18:05 EST


Jens Axboe schrieb am 2004-03-22:

> There's no such thing as atomic writes bigger than a sector really, we
> just pretend there is. Timing usually makes this true.

If there is no such atomicity (except maybe in ext3fs data=journal or
the upcoming reiserfs4 - isn't there?), then nobody should claim so. If
the kernel cannot 100.00000000% guarantee the write is atomic, claiming
otherwise is plain fraud and nothing else.

Some people bet their whole business/company and hence a fair deal of
their belongings on a single data base, and making them believe facts
that simply aren't reality is dangerous. These people will have very
little understanding for sloppiness here. Linux has no obligation to be
fast or reliable, but it MUST PROPERLY AND TRUTHFULLY state what it can
guarantee and what it cannot guarantee.

> For bigger atomic writes, 2.4 SUSE kernel had some nasty hack (called
> blk-atomic) to prevent reordering by the io scheduler to avoid partial
> blocks from databases.

That does not make a write atomic if the scheduled blocks are still
written one at a time (and I believe tagged command queueing won't help
to unroll partial writes either).

If the hardware support is missing, it is prudent to say just that and
not make any bogus promises about platter inertia and "it usually
works". (who says that the filter curves adjust to the decreasing
platter speed and the electronics are sustained for long enough? how
about write verify and remapping broken blocks?)

So we only write one hardware block size atomically, usually 512 bytes
on ATA and SCSI disk drives (MO might do 2048 at a time, but why
introduce complexity). That's a data point in this whole fsync()
discussion.

--
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/