Re: Is any file system on Linux appropriate for very large

Stephen C. Tweedie (sct@dcs.ed.ac.uk)
Thu, 18 Jul 1996 20:42:06 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Jared Mauch: "Re: magic file entry to understand core dumps"
Previous message: Malcolm Beattie: "Re: _My_ turn for a 2.1 wishlist"
Next in thread: Linus Torvalds: "Re: Is any file system on Linux appropriate for very large directories?"
Reply: Linus Torvalds: "Re: Is any file system on Linux appropriate for very large directories?"
Reply: Kai Henningsen: "Re: Is any file system on Linux appropriate for very large di"

Hi,

On Mon, 15 Jul 1996 15:05:18 +0300 (EET DST), Linus Torvalds
<torvalds@cs.helsinki.fi> said:

> The problem with hashed directories (or any other kind of "smart"
> directories) is not the lookup phase: that's pretty simple, and is
> almost always speeded up by the hashing (or the hashing is done
> badly).

> The _real_ complexity comes in when you start adding/deleting
> entries. You may get faster lookups, but you're almost sure to get
> slower modifications, and what's worse - keeping track of the extra
> state is making the whole thing more complex.

That's not necessarily true. Certainly, insert/delete becomes more
complex to code, but it can be made very substantially faster as a
result.

The cost of insert is basically the sum of two operations --- a
lookup, to determine uniqueness of the name, and the creation of a new
dirent. The lookup will be speeded up exactly as any other lookup by
a hash table or btree. As for the create part, if we are using a hash
table then we can reserve one hash bucket for empty dirents and as a
result substantially speed up the process of looking for empty space
in the directory. Similar optimisation can be done for the btree.

For delete, we have a corresponding improvement in performance --- the
cost of locating the entry to be deleted is reduced just as the lookup
cost is reduced.

The real extra cost of more complex code only occurs if we are using
truly scalable data structures, such as hierarchial hashing or a
btree. For the btree case, insert and delete may incur rotation costs
which may be quite significant, but a carefully designed btree
algorithm with multiple keys at each node will avoid doing a rotation
for every ins/del operation. The heirarchial hash only incurs extra
cost when a hash bucket grows large enough to need splitting.

A static hash table will reap most of the performance benefits of the
multi-level hash or btree; its only disadvantage will be lack of true
log(n) scalability to _very_ large directory sizes. It will be faster
than the current linear scan for insert, delete and lookup without any
of the large rotation costs which would affect a btree solution.

Cheers,
Stephen.

--
Stephen Tweedie <sct@dcs.ed.ac.uk>
Department of Computer Science, Edinburgh University, Scotland.

Next message: Jared Mauch: "Re: magic file entry to understand core dumps"
Previous message: Malcolm Beattie: "Re: _My_ turn for a 2.1 wishlist"
Next in thread: Linus Torvalds: "Re: Is any file system on Linux appropriate for very large directories?"
Reply: Linus Torvalds: "Re: Is any file system on Linux appropriate for very large directories?"
Reply: Kai Henningsen: "Re: Is any file system on Linux appropriate for very large di"