Re: clone() and pthreads?

Linus Torvalds (torvalds@cs.helsinki.fi)
Fri, 3 May 1996 11:54:37 +0300 (EET DST)


On Thu, 2 May 1996, Christopher J. Shaulis wrote:
>
> I'm actively experimenting with clone() threads on one of my personal
> projects. The idea is to code a 3rd generation web server with
> dedicated threads accepting/processing requests, sending the requested
> objects to the requestor, and executing dynamicly loaded CGI "modules"
> which live in the server's space. Thanks to Randy Chapman for helping
> me organize my thoughts on it.

Ok, it seems people _are_ experimenting with clone(). That's nice, I had
been starting to give up on it completely (I was originally hoping for a
threaded X server, but that never happened so now I have to hope that
somebody else comes up with a "worthwhile" threaded project).

> The biggest obsticle to doing things with clone() threads is the
> complete lack of support. Neither X, libc, db, or anything else is
> threadsafe out of the box.

Indeed. It needs to be jump-started some way, and threads programming is
not exactly simple..

> And there are no
> existing locking/syncronization functions -- though alan cox was nice
> enough to assult me with a clue stick, and I'm trying to learn GAS's
> asm syntax well enough to make an cmpxchg inline so I can make a
> decent semaphore.

Umm.. My suggestion for a semaphore would be something along these lines:

struct semaphore {
long count;
unsigned long waiting; /* bitmap of interested threads */
};

__asm__("lock ; decl %0\n\t" /* subtract one from sem->count */
"jns 1f\n\t" /* success? */
"pushl %1\n\t"
"jsr __wait_semaphore\n\t"
"popl %1\n"
"1:"
:"=m" (sem->count)
:"r" (sem)
:"ax","dx","cx","memory"); /* clobbered by function call */

It does an atomic decrement, and if the semaphore count goes negative it
calls a "__wait_semaphore" function that needs to be done carefully. But the
default case (lock successful) has very little overhead, and the case where
we need to sleep isn't that performance critical anyway (because by that time
we usually have to enter the kernel, so it's going to be expensive anyway).

The unlock action would be something like this:

__asm__("lock ; incl %0"
:"=m" (sem->count));
if (sem->waiting)
__wakeup_semaphore(sem);

Again, the default case would be fast: a unlock of the semaphore when
nobody was waiting on us. The __wait_semaphore and __wakup_semaphore
functions are more complex, because they are the ones that really handle
race conditions, and it's important to do things in the correct order.
Something like

/*
* Silly implementation: threads must be numbered 0-31 due to
* the way we implement the wakeup thing
*/
void __wait_semaphore(struct semaphore * sem)
{
int threadnr = current_thread_number();
sig_block(SIGUSR1); /* whatever we use to wake up a sleeper */
/* tell any wakeups that we're waiting for the semaphore */
set_bit(threadnr, &sem->waiting);

/*
* increment the semaphore count - we decremented it to
* be negative before entering this routine..
*/
repeat:
__asm__("lock ; incl %0":"=m" (sem->count));

/*
* Maybe the locker already unlocked before we blocked
* signals, so we can't depend on only the signal..
*/
if (sem->count <= 0)
sigsuspend(...);

/*
* Now "count" _should_ be positive, but maybe somebody else
* gets it before we do so we need to be careful
*/
__asm__("lock ; decl %0; js %1"
:"=m" (sem->count),"=q" (negative));

if (negative)
goto repeat;

/* Ok, we got the semaphore, remove us from the wait list */
clear_bit(threadnr, &sem->waiting);
}

And then __wakeup_semaphore() goes through the "waiting" bitmap and sends
signals to one of the waiting entries (you obviosuly need a pid <->
threadnumber mapping there). Or something like this.

Note: the above is written on the fly as I wrote this mail, so don't
expect it to work as-is. The exact semantics of "sigblock/suspend()" etc
escape my mind, and you most likely need to do some of the same playing
with the mind of gcc as the linux kernel does in asm-i386/atomic.h to
stop it from optimizing any of the "count" accesses, but maybe the above
is roughly usable. The main idea is that we try to avoid doing anything
at all when we can get the lock immediately (that makes __wait_semaphore
a bit more complex, but we don't want to enter the kernel for the "no
contention" case).

There are also starvation issues, and it might make sense to _not_
increment the semaphore counter before going to sleep, and force the
counter to be zero or negative until the wakeup code selects a process
(round-robin according to the "waiting" bitmask or whatever) that it
wakes up - the above is a rough draft and by no means perfect.

> The only real "problem" that I've noticed, and maybe its not even a
> problem, is that when a thread's parent dies, the thread will continue
> happily until it finds its own reason to die. I sorta expected a
> dying process to take all its child threads with it.

That's the intended behaviour of linux kernel threads - I wanted the
kernel threads to be really just a superset of normal processes. There is
no real distinction between a "process" and a "thread" in linux like
there is in most other environments with threads. In Linux a thread is a
first-class entity, so to say, and when the parent dies that doesn't
affect the children.

For pthreads compliance, we obviously need some way to tell "original
parent died, all threads go away now please", but that should be done in
user level with "at_exit" or maybe a "super-parent" that waits for the
"pthreads" parent and then sends a SIGKILL to the other subthreads to get
the behaviour that other broken threading implementations have.

Linus