Re: Linux's implementation of poll() not scalable?

From: Dan Kegel (dank@alumni.caltech.edu)
Date: Tue Oct 24 2000 - 01:12:37 EST


On Mon, 23 Oct 2000, Linus Torvalds wrote:
> Here's a suggested "good" interface that would certainly be easy to
> implement, and very easy to use, with none of the scalability issues that
> many interfaces have. ...
> It boils down to one very simple rule: dense arrays of sticky status
> information is good. So let's design a good interface for a dense array.
>
> Basically, the perfect interface for events would be
>
> struct event {
> unsigned long id; /* file descriptor ID the event is on */
> unsigned long event; /* bitmask of active events */
> };
>
> int get_events(struct event * event_array, int maxnr, struct timeval *tmout);
>
> int bind_event(int fd, struct event *event);

http://www.FreeBSD.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&manpath=FreeBSD+5.0-current&format=html
describes the FreeBSD kqueue interface for events:

     struct kevent {
             uintptr_t ident; /* identifier for this event */
             short filter; /* filter for event */
             u_short flags; /* action flags for kqueue */
             u_int fflags; /* filter flag value */
             intptr_t data; /* filter data value */
             void *udata; /* opaque user data identifier */
     };

     int kqueue(void);

     int kevent(int kq, const struct kevent *changelist, int nchanges,
             struct kevent *eventlist, int nevents,
             const struct timespec *timeout);

Pretty similar to yours, with the following additions:

Your proposal seems to only have one stream of available events per
process. kqueue() returns a handle to an event queue, and kevent()
takes that handle as a first parameter.

Their proposal uses a single call to both bind (or unbind) an array of fd's
and pick up new events. Probably reduces overhead.

Their proposal allows you to watch not just sockets, but also plain files
or directories. (Hey, haven't we heard people talk about letting apps
do that under Linux? What interface does that use?)

> The really nice part of the above is that it's trivial to implement. It's
> about 50 lines of code, plus some simple logic to various drivers etc to
> actually inform about the events. The way to do this simply is to limit it
> in very clear ways, the most notable one being simply that there is only
> one event queue per process (or rather, per "struct files_struct" - so
> threads would automatically share the event queue). This keeps the
> implementation simple, but it's also what keeps the interfaces simple: no
> queue ID's to pass around etc.

I dislike having only one event queue per process. Makes it hard to use
in a library. ("But wait, *I* was going to use bind_event()!")

> Advantage: everything is O(1), except for "get_event()" which is O(n)
> where 'n' is the number of active events that it fills in.

This is a big win, and is exactly the payoff that Solaris gets with /dev/poll.
 
> Example "server":

See http://www.monkeys.com/kqueue/echo.c for a very similar example server
for kqueue() / kevent().

> You get the idea. Very simple, and looks like it should perform quite
> admirably. With none of the complexities of signal handling, and none of
> the downsides of select() or poll(). Both of the new system calls would be
> on the order of 20-30 lines of code (along with the infrastructure to
> extend the fasync stuff to also be able to handle events)

Go, Linus, go! Let's do it! But don't go *too* simple. I really would
like a few of the things kqueue has.
 
> Yes, I'm probably missing something. But this is the kind of thing I think
> we should look at (and notice that you can _emulate_ this with poll(), so
> you can use this kind of interface even on systems that wouldn't support
> these kinds of events natively).

- Dan

p.s. my collection of notes on kqueue is at
http://www.kegel.com/c10k.html#nb.kqueue
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:13 EST