Yes and no. The ext2 filesystem must overprovision the inodes because
it can't add more later. Allocating them on demand would mean a lower
total space.
The other issue is that we're going to get cache misses anyway. What
I'm trying to do is speed up a cache miss.
When doing a path lookup, the directory must already have been
read. Embedding the inode has the neat advantage that reading the
directory automatically pulls the inode in at a very low cost.
> > The terrible cache locality means that we need do physical
> > I/O to read the file inode a relatively huge percentage of the time.
>
> If you can't cache dir+inodes, you'll still need to do the seeks. If
> the directories are badly fragmented, you'll need to do even more of
> them. It's likely that the top 2 or 3 levels of the directory tree
> will remain fully cached, but if the lowest levels in the tree are not
> completely cached, and if they contain in the order of 100 entries
> each, then preallocating those directories to reduce fragmentation
> ought to be a huge performance gain. The seeks for inodes further up
> the tree shouldn't matter --- if we can cache all of those
> directories, then we can cache their inodes too. At the lowest level,
> each inode is only one seek, but a fragmented 100-entry directory
> could easily be five or ten!
Nod. And yes, the top level directories normally get cached. The
problem is that a cache miss on the lowest level directory does:
seek to directory.
read
seek to inode
read
seek to first block
read/write
My thinking is that the middle seek is not needed if you can embed the
inode in the directory. Just cuts that seek out altogether.
I can't think of many times the inode is read that the directory isn't
read before hand... :)
> > The inode emedding in the directory doesn't affect the permissions
> > check at all.
>
> True, but that wasn't the point I was trying to make. The trouble is,
> we're just using multi-level directory hierarchies to fake tree lookup
> because the filesystem doesn't handle single huge directories itself
> very well. If we can get the filesystem to deal with the tree
> internally by implementing btree directories, then we get the same
> performance boost or better, but we no longer have a permission check
> and an inode lookup at every node in the tree. That has _got_ to
> speed things up enormously, as well as eliminating a lot of inode
> caching for branch nodes in the tree.
Indeed. That's a bit more ambitious tho. B-trees bring up a lot of
allocation issues that haven't (IMHO) been as well studied as the
ext2/ffs type allocation strategies.
I was trying to leverage of the ext2 simplicity and speed, with a
relatively minor change. (no changes to data block allocation etc. The
only difference is that inodes are created on-the-fly rather than
allocated from a pre-built pool).
The b-tree issue is also fairly orthogonal to the current issue. Even
in a b-tree scheme, you've still got to decide where you put the
inodes. Do you embed them in the b-tree or in a seperate inode block?
> It may well help, actually. If you fill a directory simply by
> creating a lot of files in it, then ext2fs will try to place the files
> in the same block group as the parent directory. It will allocate one
> directory block, then as files are created it will create as many file
> data blocks as it can, as sequentially as possible, until the
> directory gets extended --- at which point it will allocate another
> directory block after those files' allocations. This is in fact a
> sure fire way to get directory fragmentation, and would benefit
> greatly from the patch.
ah-hah! I'll give the patch a whirl.
> Cheers,
> Stephen.