Re: scary ext2 filesystem question

Gerard Roudier (groudier@club-internet.fr)
Sun, 27 Dec 1998 19:11:47 +0100 (MET)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Elliot Lee: "131: repeatable ext2/IDE corruption"
Previous message: Brandon S. Allbery KF8NH: "Re: Article: IBM wants to "clean up the license" of Linux"

On Sun, 27 Dec 1998, Alexander Viro wrote:

> On Sun, 27 Dec 1998, Alan Cox wrote:
>
> > > the overhead was below 5% *in initial implementation*. Overhead compared
> > > to async variant, that is. It can be done for Linux implementation but
> > > we'll have to clean the VFS stuff up before.
> >
> > Stephen is doing full journalling.
> >
> > > are not independent. If we want to guarantee that on-disk copy is
> > > consistent at any moment we must submit these changes in _some_ order. The
> >
> > No order provides total consistency without a journalling log as far as I can
> > tell. Please provide an ordering example for extending a file that does not
> > either expose unwritten blocks to the user (security failure) or potentially
> > have to go and clean a block up afterwards
> >
> > Alan
>
> Erm... What I really wonder is how much would we lose on the following
> policy:
> Request 1: <write new data block>, depends on <>
> Request 2: <write new indirect block> or <add a pointer to existing one>,
> depends on <1>
> Request 3: <update inode block>, depends on <2>.
> In the worst case it will give us lost block, not attached to
> anything. BFD.
> With the current ext2 + bdflush we can get _any_ order at all,
> AFAI can see.

The order of operations that may guarantee consistency at any time
obviously depends on the File System design and on the file operation
that is performed. As you mentionned partially, the right order for some
data added to a file could be something like the following for a
UNIXish File System:

0 - Update the allocation map (bitmap of whatever reflects allocation).
1 - Write the data.
2 - Write the indirect block (if needed)
3 - Write the inode.

This seems pretty simple, but you just miss an order of magnitude of the
real complexity to make things perfect, in my opinion:

1 - The ordering of operations depends on the file operation (already
mentionned).

2 - It may exist situations where any ordering may lead to inconsistency
if a crash occurs (due to FS missdesign).

3 - Several files that share some meta-data blocks may be accessed
concurrently at the same time.

(3) lets me think that you also must ensure some ordering for the whole
file system. If you want to lock the access to any dirty block of
meta-data then you will probably not be significantly faster than
synchronous meta-data writes.

(2) should apply to most existing FS, IMO. I am not sure of that since I
am far from being knowledgeable on this topic.

(1) I invite you to detail all possible situations for EXT2 if you have
time for. :-)

If we really want to improve existing FS consistency guarantee, we
probably should lean towards compromises rather that looking for the
perfect solution.

Happy New Year!

Regards,
Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Elliot Lee: "131: repeatable ext2/IDE corruption"
Previous message: Brandon S. Allbery KF8NH: "Re: Article: IBM wants to "clean up the license" of Linux"