Re: Linus on Linux, Apache and Threads

Andi Kleen (ak@muc.de)
Sun, 25 Apr 1999 01:15:55 +0200


On Sun, Apr 25, 1999 at 12:43:52AM +0200, Chuck Lever wrote:
> On Sat, 24 Apr 1999, Andi Kleen wrote:
> > > > You have multiple threads doing an accept on a single listen socket. As
> > > > soon as a thread finished work it calls accept and gets the next ready
> > > > connection handed from the kernel.
> > >
> > > ...or will be awakened on the connection that was handled by another
> > > thread (because of "wake everyone" handling), and accept() will fail,
> > > causing the infamous "thundering herd".
> >
> > If the load is high enough it doesn't matter, because there will
> > be always enough connections to be returned to an accept after a wakeup.
>
> right, but before you get to this point, there is a performance drop.

Then you have enough cycles left, so it doesn't matter. The server always
has to be a bit oversized to handle traffic peaks, in non-peak situation
you can afford to be a bit less efficient (to complicate code to optimize
this would be wasted time).

Also if the threads pool size is adapting quickly enough it shouldn't
be that bad.

>
> > If it isn't the threads pool should adapt and use less threads which
> > avoids the problem (and a few lost wakeups in the transitions don't harm,
> > because the machine has enough free cycles).
>
> this is sounding more complicated by the minute. you also want to tune
> this so that you have just the right number of threads active to keep the
> L1/L2 caches working at their most efficient. is there any guarantee that
> waiting in accept() won't cause round-robin behavior rather than just
> picking the first couple of threads on the list?

There is no such guarantee (except perhaps if you play with nice values[1]),
but it does not matter when you keep statistics about number of requests/time.
As soon as the average time a thread has to wait in the accept goes below
some time add more threads. If it goes gets above the time kill threads and
lower the time. Costs you a few gettimeofdays() if you don't use a time
keeper thread (a thread that updates a timestamp counter in shared memory)

>
> in other words, it's best to have the number of worker threads be close to
> the number of physical CPUs; otherwise, if the threads are scheduled in
> round-robin fashion, they could constantly knock each others' working set
> out of the CPU caches.
>
> if waiting in accept() does cause the thundering herd problem, that might
> be a good thing - the thread that wins will probably have the best cache
> foot-print.

Only real tests can show. Anyways, if the kernel accept() behaviour
should really cause problems (I would guess it doesn't by intuition, but I
have no data), then accept() should be fixed, not complicated code added
to th e user application.

-Andi

[1] I wouldn't suggest that because of the possible nasty interactions
with other server processes running on the same machine.

-- 
This is like TV. I don't like TV.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/