Re: Big files in ext2fs (but not i_osync)

Theodore Y. Ts'o (tytso@MIT.EDU)
Sun, 1 Mar 1998 20:47:11 -0500

Date: Sun, 1 Mar 1998 02:34:02 +0100 (CET)
From: MOLNAR Ingo <>

what about the following trick to avoid shuffling: direct, double and
triple pointers have their old meaning with size<2G. Once the size is over
the cut-off point, we 'steal' the _first_ direct pointer, and make it a
four-ways pointer instead. The original first pointer is put into the
four-ways block's _last_ table element. This way we'd only have to
special-case block==0, the rest would be just like before, but with
four-ways indirection for offsets above the cut-off.

This is a good way of extending ext2 for handling large sparse files
where we're running into the addressing limit of 2**24 + epsilon blocks.
We should keep in mind what problem we're trying to solve, though. The
discussion was originally centered around trying to find a place to
store the high 32 bits of i_size. I still think i_dir_acl is the best
place to store this, although it's pending negotiations with Remy as to
why he thinks non-directory inodes need to use the i_dir_acl field.

While we're on the subject of making changes to how the ext2 inode
stores direct block pointers, the far more efficient change to make is
to store extents (triples of starting logical block, starting physical
block, number of blocks) in the inode. This works since most files use
contiguously allocated blocks, so why not take advantage of that
encoding efficiency.

The hard part is deciding when an inode's blocks should be encoded as
extents, and when it should revert back to the old system. For files
which are created with a lot of seeks and jumping around, it may be much
more efficient to use the old scheme. Most of the time, though, we're
much better off using the extent-based encoding system.

- Ted

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to