Re: (reiserfs) Re: LVM / Filesystems / High availability

Theodore Y. Ts'o (tytso@mit.edu)
Tue, 23 Jun 1998 15:54:44 -0400


Date: Tue, 23 Jun 1998 18:40:06 +0200
From: Florian Lohoff <flo@quit.mediaways.net>

The LVM approach with the "virtual block device" makes many things
much easier. You can keep filesystem code very simple, and the LVM
code also isnt very complex. The only thing you might take care on is
the Block Allocation of the LVM which you might do as complex and
intelligent as you like but a bug in there will NOT cause data to get
lost or corrupt.

I disagree; the block allocation issue gets very complex if you try and
treat the LVM as a virual block device with "holes" in the device. It's
much, much simpler if the filesystem is intimately aware of each logical
volume, and knows what size it is.

I dont think that this complicates the things. We only need some
interaction between filesystems and devices. Like the
filesystem telling the device "I would like you to shrink by 4 GB, tell me
if you are able to do this" "Could you please shrink now by 4 GB, tell me
when ready" ...

Life is much more complicated that this, though; usually you don't want
to add and delete physical disks just at the end of the filesystem, but
in the middle. That means the filesystem needs to intimately know about
where the logical volumes are on the disk.

There are two choices in this case. The first is to have the filesystem
intimately aware of where all of the volume boundaries are, thus very
much complicating the block allocation code --- never mind handling the
case where part of the inode table is in on the physical disk to be
removed. The second is to do a *lot* of copying of disk blocks when you
want to reorganize your physical disks. It simply doesn't scale at the
large terrabyte size.

BTW: I feel a bit like ext2 is going the Microsoft way of doing
things. Keep as much as compatibility as possible, and therefor
accept compromises.

Actually, there have been very little compromises that we've needed to
accept. While compatibility is very important, it is certainly isn't
the only design goal we are striving for. Performance and reliability
are also paramount. As a result, for example, Stephen has convinced me
that we probably don't want to structure the B-trees to allow for
forwards compatibility. (Backwards comaptibility, yes; just not
forwards compatibility.) Structure B-tree directories in such a way
that Linux 2.0 kernels could still read them would involve too many
performance compromises, and the moment a filesystem started using
extent maps, it would be incompatible with Linux 2.0 kernels anyway.

There are two main advantages with sticking with the ext2fs filesystem
design path. The first is robustness --- we can control the relatively
small parts of the filesystem that need to be changed to add B-tree
directories, for example, while keeping the rest of the filesystem code
stable. We don't need to re-invent the rest of the filesystem code.
The second is an easy upgrade path. We can make it much easier for
people to upgrade from their existing ext2 filesystem to one which
supports B-tree directories and extents.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu