Re: structure dentry help...

Alexander Viro (viro@math.psu.edu)
Tue, 2 Nov 1999 09:31:24 -0500 (EST)


On Tue, 2 Nov 1999, Jamie Lokier wrote:

> Alexander Viro wrote:
> > You are sorting the thing by inumber and then do tons of stat()? Ouch.
>
> Yes. Actually I avoid the stat() when open(name,
> O_NOFOLLOW|O_DIRECTORY), with optional fstat() gets equivalent results
> in fewer system calls. (System call overhead is significant when the
> data is in cache already).
>
> Either way the inodes are read in inumber order.
>
> > If anything, you might want to start readahead on inode tables. Hmm... You
> > know, it might work _and_ would be completely independent from VFS. I'll
> > look at it.
>
> I don't understand. You mean that reading inode X causes async
> readahead for inodes [X+1...X+99] too? Do you have in mind readahead on
> inumbers (which are not always contiguous on disk) or on block numbers?

I don't think that anything except the filesystem can decide what to do
here. Consider the following variant (I'm not saying that it would be
optimal, etc.):
Add a function ext2_want_inode(). It will find a position of inode
on disk, try to find the corresponding buffer in the cache, add it there
if needed and issue asynchronous read request.
Whenever you think that you might need the inode soon (e.g. in
ext2_readdir()) call ext2_want_inode(). Since we are talking about async
requests we are not going to be blocked - they will be simply inserted
into queue.
Notice that multithreaded userland program buys you only one
thing: each thread issues read request and blocks on it but other threads
are able to feed additional requests into the queue. And you are paying
for threads overhead. IOW, multiple threads are used only as a kludge
allowing to emulate non-blocking reads. Fine, let readdir() issue them and
forget about the threads - kernel will do it better. And forget about
hassles with inumber ordering.

Now, this variant will probably suck quite badly - we will need
some throttling, but I think that it's worth trying. I don't believe that
VFS has any business here - mechanism that leaves everything to filesystem
and is transparent for the rest of the world looks as a saner idea.

> Ideally for my program, I would like to be able to submit a list of
> inodes and read them all in in the best order. Over NFS, that might
> overlapping requests for example. On ext2 on a local disk, that would
> involve feeding a lot of requests to the elevator algorithm so they are
> block ordered.

Why do you want to bother with explicit submitting the list? Hint
the fs that you will need that and let it decide what to do. In the most
primitive variant readdir() may be considered as a hint. Or you can add a
fcntl() for directories (I'm not sure that it's worth bothering - it's
quite possible that we can always do it and win).

> On way is to use threads. As each thread blocks on I/O, so there is
> room for another to submit the next request. This should give better
> results over a network due to the pipelining, and better results locally
> due to block request merging and assistance from the elevator algorithm.
> (But only if the elevator algorithm will insert new requests; I am under
> the impression it does not).
>
> I tried to simulate readahead/overlap in user space using threads, each
> reading a sequential range of inodes pulled from a common list using
> atomic operations. But it was a net loss in all cases but one, and in
> that case the gain wasn't significant.

Ouch. So you are creating a lot of threads (with all associated
context switches, etc.) just to feed a list of async requests to device?
Small wonder that you are getting a performance hit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/