Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

From: Theodore Ts'o (tytso@mit.edu)
Date: Mon Oct 07 2002 - 19:39:04 EST


On Mon, Oct 07, 2002 at 12:35:26PM -0700, Linus Torvalds wrote:
>
> On Mon, 7 Oct 2002, Daniel Phillips wrote:
> >
> > Ext2 likes to spread directory inodes around the volume so that there is
> > room to keep the associated file blocks nearby. This interacts rather
> > poorly with readahead.
>
> Not a read-ahead problem. It interacts rather poory _full_stop_.
>
> It means that the inode tables are spread all out, the bitmaps are
> fragmented etc, so the disk head has to move all over the disk even when
> only working with one directory tree like the kernel sources.

It depends on what you are doing. BSD, and even XFS, uses the concept
of using cylinder groups or block groups as one of many tools to avoid
file fragmentation and to concetrate locality for files within a
directory. The reason why FAT filesystems have file fragmentation
problems in far more worse way is because they attempt don't have the
concept of a block group, and simply always allocate from the
beginning of the filesystem. This is effectively what would happen if
you had a single block/cylinder group in the filesystem.

> So the problem with spreading stuff out doesn't have anything to do with
> read-ahead, and has everything to do with the basic issue of BAD LOCALITY.
> Locality is _good_, independently of read-ahead and independently of
> medium.

Ironically, as I mentioned, one of the reasons behind the block group
scheme is to *increase* locality for files within a particular
directory. As you point out quite correctly, though, it tends to
destroy locality across an entire directory tree.

Maybe the answer is that we need some way of declaring that some
directory is the root of "a directory tree". That way, the filesystem
can keep directories underneath the directory tree close together, and
the filesystem can try to keep directory trees far apart from each
other.

In order to do something like this, we would just need a filesystem
API extension to allow programs like tar and bitkeeper to give a hint
that a new directory tree is being established --- and ideally, it
needs to be done at mkdir time, so that the filesystem can perform
appropriate do a better job of deciding where to place the initial
root of the "directory tree". Things would also work if you declared
some directory tree to be the root of a "directory tree" after the
directory was initially created, but the allocation hueristics
wouldn't be nearly as effective.

Linus, what do you think about defining a new flag which could be
passed as part of the mode bits to mkdir()? If we allow the
filesystem to get some additional hints from userspace about what the
difference between /usr/src/linux (where directory spreading is a bad
idea) and /usr/home (where directory spreading is a very good idea),
it would make life for the filesystem much easier.

                                                - Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 07 2002 - 22:01:02 EST