RE: Signal driven IO

David Schwartz (davids@webmaster.com)
Tue, 16 Nov 1999 20:54:33 -0800


> From: "David Schwartz" <davids@webmaster.com>
> Date: Tue, 16 Nov 1999 15:12:00 -0800
>
> A clear winner regardless of how many file desciptors are
> ready per "loop
> pass"? Surely there has to be some number at which poll
> becomes superior.
> Certainly with artificial cases where you overload the CPU and
> deliberately
> hit multiple links at the same time.
>
> The measure of this inflection point should be the
> number of signals
> 'backed up'. Clearly, if the backup number is 100% of the
> number of file
> descriptors, poll would be better.
>
> It's actually not clear poll() is a win; if you have scenario where you
> have multiple threads (possibly on an SMP system), by using signals, you
> can quickly and easily dispatch jobs to different worker threads by
> simply having different threads picking up the real-time signals.
>
> If you use poll(), you either have to do fancy locking, or you have to
> have one threads which does IPC to talk dispatch tasks to each worker
> thread, which is not ideal.

Not at all. For one thing, you can have the same thread that does the
'poll' do all the actual I/O. This works well with a send queues and receive
queues. Since non-blocking network I/O is so fast, it's almost impossible
for this I/O thread to become a bottleneck.

If you fear that this doesn't take enough advantage of SMP, you can create
additional I/O threads. Each can 'poll' on a subset of the connections of
interest. You can even get clever and distribute the connections
intelligently -- isolating the busy ones from the quiet ones. This way, your
'larger' poll doesn't need to get called very often.

> One other observation, which an engineer from Netscape (who does IMAP
> server implementations for Linux and NT) points out is that NT
> completion ports have two features which RT SIGIO doesn't provide. One
> is processor affinity; on SMP systems, when a completion port becomes
> ready, the system will try to schedule the callback thread on the same
> processor which registered the callback originally. The other is thing
> which NT does is LIFO ordering when multiple completion ports are to be
> scheduled. Both of these strategies are an attempt to avoid cache
> misses. I'm not sure how much different these tricks actually make, but
> John seem to think it would make a noticeable difference for his
> application.

I'm not sure how much of a difference they make because there's no easy way
to isolate them and test without them.

One other thing that NT's IOCP has is concurrency control. If you have 2
processors, the system will do its best to keep two threads active at any
time, freeing another thread if one of those two blocks. That's kind of neat
too.

DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/