Re: True fsync() in Linux (on IDE)

From: Matthias Andree
Date: Mon Mar 22 2004 - 15:29:10 EST


On Mon, 22 Mar 2004, Christoffer Hall-Frederiksen wrote:

> >If there is no such atomicity (except maybe in ext3fs data=journal or
> >the upcoming reiserfs4 - isn't there?), then nobody should claim so. If
> >the kernel cannot 100.00000000% guarantee the write is atomic, claiming
> >otherwise is plain fraud and nothing else.
> >
> >Some people bet their whole business/company and hence a fair deal of
> >their belongings on a single data base, and making them believe facts
> >that simply aren't reality is dangerous. These people will have very
> >little understanding for sloppiness here. Linux has no obligation to be
> >fast or reliable, but it MUST PROPERLY AND TRUTHFULLY state what it can
> >guarantee and what it cannot guarantee.
>
> Some databases (eg. oracle) can write a checksum for each database page
> to overcome this problem, as this is not just "a linux problem".

I am aware some databases support checksumming (Berkeley DB also does,
since v4.1 (*), and probably a lot more so they know where log recovery
starts) but does that make statements sensible that claim the timing
(some stochastic factor) would usually give "guarantees" about
atomicity of the individual page write when the hardware doesn't
guarantee anything beyond 512 bytes at a time? I think it does not.

I don't mind to beat up anyone, I'd just like to have the guarantees
documented without thin-ice kind of promises "usually you'll get more".

It's good to get more than what was asked for, but the application
designer cannot take that into account because he gets no guarantees. So
why bother wasting space and time for writing and reading such lines? Or
even discussing?

Maybe some interface so an application can query the maximum size of an
atomic write for any given file system (stat[v]fs extension maybe) would
be useful though, so applications can be optimized for data-journaling
file systems should these prove capable to provide "large atomic write"
guarantees.

(*) http://cvs.sourceforge.net/viewcvs.py/bogofilter/bogofilter/src/datastore_db.c?only_with_tag=branch-db-txn#rev1.93.2.5

--
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/