Re: linux-kernel-digest V1 #2486

Jeff Garzik (jgarzik@pobox.com)
Thu, 3 Sep 1998 13:55:58 -0400 (EDT)


Eric Raymond wrote:
> Secondly, threads suck. They're currently fashionable, but it's by no
> means clear that they're any improvement on asynchronous I/O.

> I say this because threads encourage in the time domain all the same
> kinds of aliasing havoc that you can get by being careless with
> pointers in the spatial domain. It's way too hard to mentally model or
> make provable assertions about the global state of a multi-threaded
> program.

A news server is one of the more I/O intensive applications I've
come across. A modern NNTP server must keep up with a load of up
to a million messages per day (sometimes 50/sec), and up to 20 GB
per day. Each news article must be stored in the filesystem, in logs,
in a hashed-lookup history database, in an overview (article header)
database, and in an anti-spam database. Bigger sites often have
500 or more incoming NNTP streams to process at once. Network, CPU,
and disk all must be top-notch, or your site falls behind and backlogs.

INN, the most popular news server, uses a single-threaded select loop;
this has proven a bottleneck time and time again, and the authors are
moving things to other processes as fast as they can. The latest
versions of INN are better because they don't use the
single-msg-per-file storage method of years gone by, but they are
nowhere near the scability of the servers below.

Diablo, news feeder software from Matt Dillon, uses a low-impact forking
model, where a connection is passed off to a newly-forked daemon child
process. Diablo is VERY efficient, but is subject to easy
denial-of-service attacks because of its forking nature; if you throw a
bunch of connections at it at once, your load goes through the roof
because of per-process VM usage and fork overhead.

HighWind's [commercial] Cyclone and Typhoon software uses threads, and
it beats the pants off of the competition. Their code uses the
thread-per-connection model, which is very conceptually simple to
implement (just like the forking model). Their software redefines low
impact -- you can now run a full feed into a box with limited memory and
a single disk drive.

That is a real-world application where threads are a huge win.
IMHO I/O and memory usage are where threads are the most useful.
Play around with the POSIX AIO library, and do some comparisons with
other I/O methods...

The point is that threads are simply a tool. Saying that threads suck
simply exposes a limited view of computing. Threads are very useful
in some situations, and not at all useful in others. With threads,
the OS no longer wastes time and space creating a new process.
Your application no longer wastes time copying data from process to
process. It is no harder to "think" about a threaded program than
it is a multi-process one; you simply perform different actions to
accomplish the same end.

The one big downside to threads now is that development support
under Linux isn't very mature yet. (there are still some really bad
thread-related bugs in Linux 2.0, maybe 2.1 too) Developing and
debugging threaded apps on Solaris is mature and straightforward;
and as long as the code uses POSIX threads, porting to Linux is easy.

Also check out the ACE library (http://www.cs.wustl.edu/~schmidt/ACE.html).o
It's a heavily multi-threaded C++ library for telecomm/networking use.
Over and above its use by several major industrial players, the docs
and statistics provided with ACE demonstrate scalability and speed
far above that of a forking model.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html