Re: Multithread select() bug

From: Eric Dumazet
Date: Mon May 10 2004 - 17:36:03 EST


Andre Ben Hamou wrote:

Eric Dumazet wrote:

Your program is racy and have undefined behavior.

A thread should not close a handle 'used by another thread blocked in a sytemcall'

The race is : if a thread does a close(fd), then the fd value may be reused by another thread during an open()/socket()/dup()... syscall, and the first thread could issue the select() syscall (or read()/write()/...) on the bad file.


Apologies, but I don't follow this.

It was my understanding that the (potentially) many threads of a single process all share a canonical file descriptor table. Hence as long as the various calls you mention are issued in a guaranteed order, maintaining state as you go (which is what the 1 second sleep in the test code was a very quick and dirty way to almost do), I don't see how a race condition arises.

If I were to replace the sleep (1) with, say a global semaphore or something similar, would your explanation still hold?

So please how do you guarantee that thread 1 runs *before* thread 2)

Thread 1)
select( fd)

Thread 2)
close(fd)

Thats not possible.

Only pthread synchronization are mutexes (or rwlocks, or semaphore). And you cannot release mutex/rwlock/semaphore *after* entering Thread1) select()

So yes, there is a race condition.

In you example, it can happens that thread 1 must sleep 10 seconds before calling select(), because of system scheduling. Your sleep(1) cannot garant that the close() is done after the select() call blocked into kernel. You can try whatever semaphore you want, you wont be able to have a 100% reliable program (even on Solaris)

Eric

Cheers,

Andre Ben Hamou
Imperial College London


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/