Re: VFS 64-bit clean

Theodore Y. Ts'o (tytso@MIT.EDU)
Mon, 23 Feb 1998 15:20:04 -0500


Date: Sat, 21 Feb 1998 21:32:01 +0200 (EET)
From: Samuli Kaski <samkaski@cs.helsinki.fi>

I wouldn't waste time trying to make amazingly clever ext2fs extensions
playing with reserved/unused bits in the structures. My feeling is that
the direction should be ext3fs (based on ext2fs)

It would only take someone with the balls to get things going and the
already known linux-gurus to give their input so that ext3fs gets
designed in a way that allows future enhancements without noticeable
performance degradation.

There has been general enthusiasm for a "new" filesystem, but not
necessarily consensus on what the new features should contain.

For example, just to take the simplest example: when people say that
they want 64 bit support, do they mean:

* support for sparse files greater than 2GB?
* support for files which contain greater than 2**42 bytes of data
* support for large files that can be mmap'ed, or just read?

The requirements for all of these are different, and satisfying some of
them will have tradeoffs in such things as performance for the common
case (where the common case is *still* files less than 2GB). For
example, support for 64-bit non-sparse files on i386 machines will
require VM changes which Linus has already said he's not willing to
make, because of the performance hit you would take in needing to
manipulate 64-bit VM addresses everywhere.

Also, in general, compatibility with ext2 isn't the problem. We have
the right tools in the superblock to make this as painless as possible.
The hard part is actually *coding* the new features in such a way that
they are robust, maintainable, and efficient.

For example, if someone wants to design a new B-tree based directory
format, and gives me kernel and user-level routines for manipulating
them, it wouldn't at all difficult to add that into the existing ext2
filesystem. If necessary, we could even trivially set things up so that
old kernels could mount an ext2 filesystem read-write; it just wouldn't
be able to manipulate those directories which were using the new format.
The hard part is not in the compatibility; it's in writing the actual
B-tree directory code.

I could go on; there are also some very interesting designs on the board
for storing the block numbers attached to a file in a run length encoded
format, which is a big win for files which are contiguously stored on
disk, since it eliminates nearly all indirect blocks. Again, doing it
in a way which retains the basic ext2 filesystem format isn't the hard
part. It's actually providing support for the feature in the first place.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu