Proposal: Event driven file descriptors, take 2

Perry Harrington (pedward@sun4.apsoft.com)
Sun, 21 Jun 1998 17:38:57 -0700 (PDT)


Over the last several days, many people have been arguing about the
validity of threads and how to address lots of file descriptors in
one program, without bringing down the horse.

select(2) and poll(2) are the most common ways to get "notified" of
a file descriptor change. The problem is that select and poll are
implemented as pollable interfaces which wake up the user program
when any file descriptor they're waiting on changes state (wants more
data, has more data, or an exception occurred).

These interfaces are not entirely efficient for very large numbers of
file descriptors in a single process.

I'm proposing a new API for dealing with large numbers of file descriptors.
The API implementation is geared to provide event driven file descriptor
notification without scanning of lists.

The system is built around the notion of registering "FD callbacks".
An FD callback is simply a hook that says: "when this file descriptor
changes state (gets data, wants data, has an exception), put a structure
that has information about this descriptor, in a queue".

You would have a function to attach an event notifier to an FD, detach
an event notifier, process an event queue, and a function to cause the
process to sleep until another event happens.

The pseudo code for a program implementing this would be:

FD_EVENT queue;
struct timeval tv;

register_fd_event(fd,&queue);

wait_fd_event(&queue,&tv);

while(get_fd_event(&queue)) {
...
}

...

unregister_fd_event(fd,&queue);

There are several advantages to the above design:

- You register an event notifier for an FD *once*, not every time you want data.
- It's completely event driven from kernel space; the various handlers of FDs in
the kernel do the work of maintaining the event queue and populating it.
- It eliminates the "for 1 to n ... FD_ISSET(fds,fd)" needed for select/poll use.

Within the kernel, the implementation would be something like this:

register_fd_event:
- add a node to a list of event notifiers on an FD, with a pointer to an
event queue

wait_fd_event:
- put process to sleep, register an alarm if a timeout is specified

unregister_fd_event:
- remove the event notifier from the FD notifier list

When the status of a file descriptor changes, a node is added to the event notifier
queue of the process, and it is scheduled to be woken up.

The above design allows programs to maintain large numbers of file descriptors with
little overhead. It eliminates the scanning of descriptor lists, and allows for
the expansion of the number of events handled by a process (eg, socket is closed,
notify process).

I definitely think that an efficient and robust interface is needed for Linux to
keep up with the demand that people place on it. Linux is a really great OS, and
people are pushing it to it's limits, we need user level APIs that can exploit the
power of Linux.

--Perry

-- 
Perry Harrington       Linux rules all OSes.    APSoft      ()
email: perry@apsoft.com 			Think Blue. /\

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu