Re: I/O completion ports for Linux

Richard Gooch (rgooch@atnf.CSIRO.AU)
Wed, 1 Apr 1998 14:01:22 +1000


Theodore Y. Ts'o writes:
> Date: Wed, 1 Apr 1998 11:19:24 +1000
> From: Richard Gooch <rgooch@atnf.CSIRO.AU>
>
> The second patch removed the need for f_op->poll calls (an optional
> flag was added to struct file which could be queried by do_select()
> and do_poll()). This speeded up polling by another 3x to 4x. This
> patch required no changes to userspace code.
>
> Off hand, this sounds like a very good idea. Did you consider what
> happens if there are two processes calling select on a shared file
> descriptor?

I don't see that it's any different from the current (at least, in
2.1.5x when I looked at it) situtation. When waking up processes
on a wait queue, they are *all* woken up, and then the processes in
do_selct() or do_poll() scan all their fds again.
The main difference is that instead of calling f_op->poll() which does
the activity test and poll_wait(), this operation is moved to
do_select() and do_poll().

Is there something I'm missing?

> Finally, the third patch created the poll2(2) syscall. This provided a
> more efficient interface to the kernel, and removed the need for an
> application to search all fds to see where there was activity. Since
> the kernel already has to search all fds for activity, it is more
> efficient to pass back to userspace a short list of fds which have
> activity, saving the application the time of searching the big list of
> fds. This new syscall works well for both single-threaded and
> multi-threaded servers.
>
> At some level, that's what the IO completion ports are all about,
> although they add the additional twist that not only do they notify you
> that data is available, but actually transfer the data to the memory
> buffer and tell you how bytes were transfered. They also don't require
> an additional system call (since you can use something like fcntl to
> register the fd with the I/O completion port).

Well, maybe they don't require an extra syscall, but there is
nevertheless an API change.
Is there some reason why extra options for fcntl(2) is better than a
new syscall?

> The question, then, is if we're going to be modifying the user API,
> what ultimate API is best? A poll2 interface, or a I/O completion style
> interface?

Good question. However, IMHO the first step is to get my second patch
into the kernel, since that doesn't change the API.

> This is not to say that completion ports are not without their problems.
> There are also questions of what happens if you try to register more
> than one asyncronous I/O --- does it return an error, overwrite the
> previous I/O request, etc? Do you allow asyncronous reads and writes?
> Since I'm on the road, I still haven't had a chance to look at Robey's
> proposal, but there are some design/API questions that we need to
> consider.
>
> Both the second and third patches would massively improve the
> scalability of polling in Linux. Unfortunately, I didn't manage to
> get either into Linus' kernel, so after perfecting my patches, I
> stopped working on them. If I can get some encouragement from people
> who's opinion has some weight with Linus, I could resurrect these
> patches.
>
> I believe the second patch is definitely worth revisiting and
> considering for inclusion, modulo some design questions that I mentioned
> above. The third patch IMO needs to wait on the higher-level
> architectural question of how we want to provide this kind of
> functionality in general....

Yep. The question is: will Linus accept my second patch? I got a wall
of silence last time around...

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu