b-tree search of filename in a directory

afei@jhu.edu
Sun, 12 Dec 1999 16:39:43 -0500 (EST)


I am wondering why this algorithm is not used in EXT2. Is there
any historical reason for that? Recently we work on
a project to optimize EXT2 by using b-tree or avl-tree to expedite the
search of filename/inode in the directory entry table.

After we thought over it, we believe it is essentially not optimized by
using b-tree. My argument is as following:

The current directory entry table is organized in a linked list way, every
entry has its rec_len that can be used to refer to the next entry. The
most important function to find a file when you do "ls" or anything like
that, is the namei.c:ext2_find_entry. In this function, the read_ahead
reads blocks of directory entry into the memory from either buffer cache
or the block device. The search for the match is done in a linear way,
i.e. the directory entry is processed one by one until the right match of
the filename is found.

To attempt to expedite the search, we want to use b-tree or avl-tree. But
we found that there are two principles we have to follow: 1) the tree
needs to be dynamically allocated or deallocated, it is not going to stay
in the main memory in a static way. So some pages of the the tree can be
swapped out while the dir entry in those pages can become dirty. 2) When
we build the tree, we need to first find the entry that matches the
filename in request. Otherwise, the tree won't help to find the entry.
These two principles have serious implication with tree approach that the
fisrt time to search for a filename won't be optimized because the
overhead in building the avl-tree is large. *Most importantly*, the
directory entry was already found in that case; after that some of the
inodes/directory entry is kept in the buffer cache through hashing, since
they can be dirty and can be swapped out, it is mandatory to rebuild the
tree to find the match for the filename, again it is the overhead and the
problem that the entry has to be found before time.

In short, I believe the b-tree approach to expedite filename search is not
going to work unless the dir entry is essentially reconstructed so that a
directory on disk has both a directory entry table and a filename table.

Fei

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/