Thread hang problem on 2.2.4

Chris R. Jones (chris@sb.net)
Mon, 5 Apr 1999 17:02:01 -0700 (PDT)


Hello everyone,

I've been working on a linux program which uses threads very heavily,
and I believe I've encountered a bug with either the linux kernel or
glibc & linux-threads.
I've put a lot of effort into narrowing down this problem to a simple
program which demonstrates this behavior, which I've included below.
I've experienced this both on Redhat-5.2 and earlier releases with
linux-threads installed manually. I've also see this problem on multiple
kernel releases, including kernels 2.0.25 and 2.2.4.
Basically, the problem is this: In a single process, I create three
threads: one which serves as a listener for TCP/IP connections,
the next which serves as the 'server' end of a TCP/IP socket which is created
by the listener thread, and finally a socket for the 'client' end of
the TCP/IP socket which connects to the 'server' end of the TCP/IP socket
through a connect() call (which is serviced by the listner thread).
Once I have these two threads on opposite ends of the TCP/IP connection,
I have the two threads send random-sized chunks of data from 1 to 5000 bytes
to each other. They randomly alternate which thread is the receiver and
which is sender. Eventually the two threads deadlock, both while receiving.
I verified that one of the threads did send the data the other is waiting
to receive.

I'm afraid I'm not sure where to go from here. I'm posting this message
to the linux-kernel, linux-threads and linux-c-programming newsgroups in
hopes this will find its way to the right place. I don't subscribe to the
linux-kernel newsgroup because of its traffic, so please CC: responses to
me.

Here is the program, which I've tried to make as short and
readable as possible. I compile this simply as:

g++ threads_problem.c -lpthread

It generates a lot of output (I'll give an example below) so I run it
like "./a.out >& output" and wait until all the threads go to sleep for
a few seconds.

Please let me know if I can help in any way to
resolve the problem, or provide any additional information.

Thanks,
Chris Jones

--------------------------------------------------------------------------------

#define _REENTRANT
#include <signal.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <iostream.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>

#define PORT_BASE 4500
#define NUM_TRANSFERS 10000
#define MAX_TRANSFER_SIZE 5000

pthread_t server_tid;
pthread_t client_tid;

typedef struct linkinfo_s {
int sock; /* The socket for the TCP/IP connection */
int direction; /* is this socket the server(0), or the client(1)? */
} linkinfo;

volatile int transfer_size[NUM_TRANSFERS];
volatile int transfer_direction[NUM_TRANSFERS]; /* Send from server to client or
* vice-versa?
*/

void *listener_start(void *p_sock);
void *linkstart(void *linkinfo_p);

/* Function: listener_start
* Purpose: This listener thread's only purpose is to wait for the
* client end of the TCP/IP to connect, and then to launch
* a thread to handle the server end of the TCP/IP connection
*/
void *listener_start(void *p_sock)
{
int lsock = (int)p_sock;
int sock;
int i;

if((sock = accept(lsock,0,0))<0){
perror("accept()"); abort();
}
else {
linkinfo *inf;
inf = new linkinfo;
inf->sock = sock;
inf->direction = 0;
if(pthread_create(&server_tid,0,linkstart,(void *)inf)){
perror("pthread_create()"); abort();
}
}
close(lsock);
return 0;
}

/* Function: fullread()
* Purpose: Reads data from socket 'sock' into buffer pointed to by 'buf'
* until 'nob' bytes are read.
*/
int fullread(int sock, char *buf, int nob)
{
int c;
do {
c = read(sock,buf,nob);
if(c < 0){
perror("read");
return -errno;
}
else if(c == 0){
fprintf(stderr,"Peer closed unexpectedly\n");
return 0;
}
else {
buf += c;
nob -= c;
}
} while(nob);
return 0;
}

/* Function: fullwrite()
* Purpose: Writes data from buffer pointed to by 'buf' to socket 'sock'
* until 'nob' bytes are sent.
*/
int fullwrite(int sock, char *buf, int nob)
{
int c;
do {
c = write(sock,buf,nob);
if(c < 0){
perror("write");
return -errno;
}
else if(c == 0){
fprintf(stderr,"Peer closed unexpectedly\n");
return 0;
}
else {
buf += c;
nob -= c;
}
} while(nob);
return 0;
}

/* Function: linkstart()
* Purpose: This function is used as the thread starting-point to handle
* a TCP/IP connection. It is passed a linkinfo structure which
* has the socket to use, and an indication if this is the client
* or server end of the link.
* This function will attempt to send or receive NUM_TRANSFERS
* chunks of data to the other end, whose size and direction are
* indicated by the transfer_size and transfer_direction arrays.
*/
void *linkstart(void *linkinfo_p)
{
char buf[MAX_TRANSFER_SIZE];
linkinfo *info = (linkinfo *)linkinfo_p;
int sock = (int)info->sock;
int i;

for(i=0;i<NUM_TRANSFERS;i++){
if(info->direction == transfer_direction[i]){
fprintf(stderr,"%d %d %d fullread start\n"
,info->direction,i,transfer_size[i]);
fullread(sock,buf,transfer_size[i]);
fprintf(stderr,"%d %d %d fullread stop\n"
,info->direction,i,transfer_size[i]);
}
else{
fprintf(stderr,"%d %d %d fullwrite start\n"
,info->direction,i,transfer_size[i]);
fullwrite(sock,buf,transfer_size[i]);
fprintf(stderr,"%d %d %d fullwrite stop\n"
,info->direction,i,transfer_size[i]);
}
}
close(sock);
return 0;
}

main()
{
pthread_t list_tid;
int lsock,sock;
struct sockaddr_in name;
struct hostent *hp;
int i;

/* First, randomly decide which direction each transfer will go between
* the client and server and the size of the transfer
*/
initstate(1000+getpid(),new char[256],256);
for(i=0;i<NUM_TRANSFERS;i++){
transfer_size[i] = random()%MAX_TRANSFER_SIZE;
transfer_direction[i] = random()%2;
if(!transfer_size[i])
transfer_size[i]++;
}

/* First, start listening on a socket, and launch a thread to
* listen for connections on it
*/
memset(&name,0,sizeof(name));
name.sin_family = AF_INET;
name.sin_port = htons(PORT_BASE+getpid());
if((lsock = socket(AF_INET,SOCK_STREAM,0))<0){
perror("socket"); exit(1);
}
if(bind(lsock,(struct sockaddr *)&name,sizeof(name))){
perror("bind"); exit(1);
}
if(listen(lsock,1)){
perror("listen"); exit(1);
}
if(pthread_create(&list_tid,0,listener_start, (void *)lsock)){
perror("pthread_create"); exit(1);
}

/* Next, make a connect() call to connect to this listener,
* and create a new thread to handle this client end of the socket
*/
if(!(hp = gethostbyname("localhost"))){
fprintf(stderr,"gethostbyname failed w/ err %d\n",h_errno);
abort();
}
memcpy(&name.sin_addr.s_addr,hp->h_addr,hp->h_length);

linkinfo *inf;
if((sock = socket( AF_INET, SOCK_STREAM, 0)) < 0){
perror("socket");
abort();
}
if(connect(sock,(struct sockaddr *)&name, sizeof(name)) != 0){
perror("connect");
abort();
}
inf = new linkinfo;
inf->sock = sock;
inf->direction = 1;
if(pthread_create(&client_tid,0,linkstart,(void *)inf)){
perror("pthread_create"); abort();
}

/* Finally wait for all threads to complete */
pthread_join(list_tid,0);
pthread_join(server_tid,0);
pthread_join(client_tid,0);
}

--------------------------------------------------------------------------------

The tail of the output from an example run looks like this:

1 907 3452 fullread start
0 906 2026 fullwrite stop
0 907 3452 fullwrite start
1 907 3452 fullread stop
1 908 4735 fullread start
0 907 3452 fullwrite stop
0 908 4735 fullwrite start
0 908 4735 fullwrite stop
0 909 2141 fullread start

You can see that the client thread (1) most recently started reading the
908th transfer which was 4735 bytes, and went to sleep on the read.
Control switched to the server thread (0), which then wrote the 908th transfer
and fell asleep reading the 909th transfer. However, the client never
woke up to read the 908th transfer!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/