Re: Filesystem optimization..

Richard Gooch (rgooch@atnf.CSIRO.AU)
Tue, 30 Dec 1997 12:25:45 +1100


Michael O'Reilly writes:
> ebiederm+eric@npwt.net (Eric W. Biederman) writes:
> > MR> Even in this, there's still a win from not needing to allocate a fixed
> > MR> amount of inodes.
> >
> > And again see btree based filesystems. There is reiserfs in the
> > works, as well as my own shmfs filesystem (though because it has
> > different prioirties, it doesn't yet keep all inodes in the btree) but
> > basically with such a beast it is possible, to keep inodes in the
> > directory tree.
>
> I've had a number of people point these out, but there's not a
> terribly good option for me. I need a stable filesystem, so the
> smallest possible change for the largest gain.

Your proposed changes to ext2fs would not exactly be "smallest
possible change". It could introduce all kinds of bugs.

> People also have pointed out things like btree based directory trees etc,
> but btree directories are a win when you have large directories, as
> oppossed to lots of directories.
>
> The critical function I'm trying to optimize is the latency of the
> open() system call.
>
> > MR> In practise, on large server, it's rare to get a very high level of
> > MR> cache hits (3 million file filesystem would need 384K of ram just to
> > MR> hold the inode tables in the best case, ignoring all the directories,
> > MR> the other meta-data, and the on-going disk activity).
> >
> > Perhaps the directory cache is too small for your machine?
>
> There are around 390,000 directories holding those files. Just how big
> did you want to the directory cache to get!?
>
> The point is that caching simply won't work. This is something very
> close to random open()'s over the entire filesystem. Unless the cache
> size if greater than the meta-data, the cache locality will always be
> very poor.
>
> So: Given that you _are_ going to get a cache miss, how do you speed
> it up? The obvious way is to try and eliminate the seperate inode
> seek.
>
> > MR> My example case has less than 100 entries per directory. (LOTS of
> > MR> directories tho).
> >
> > Sounds like a case of a too small directory cache. ext2 has some
> > fairly slow directory routines, which I notice whenever I do an ls in
> > a the usr/X11R6/man/man3 directory where all of the filenames are too
> > large for the cache. It takes forever in part because I run zlibc
> > which stats them all, etc.
>
> The filenames are all 8 letters long. The issue isn't the directory
> cache. The issue is the (IMHO) large number of seeks needed to read
> the first block of a file.

>From my quick reading of the reseirFS paper, it will reduce the number
of seeks require to open a file. It looks like it will do what you
want. Why not download it and benchmark it for *your* application and
let us know the results?

Regards,

Richard....