Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

From: Vegard Nossum
Date: Sun Jan 18 2009 - 09:10:24 EST


On Sun, Jan 18, 2009 at 2:44 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> So in short: Is it possible that inode_double_lock() in
> splice_from_pipe() first locks the pipe mutex, THEN locks the
> file/socket mutex? In that case, there should be a lock imbalance,
> because pipe_wait() would unlock the pipe while the file/socket mutex
> is held.
>
> That would possibly explain the sporadicity of the lockup; it depends
> on the actual order of the double lock.
>
> Why doesn't lockdep report that? Hm. I guess it is because these are
> both inode mutexes and lockdep can't detect a locking imbalance within
> the same lock class?
>
> Anyway, that's just a theory. :-) Will try to confirm by simplifying
> the test-case.

Hm, I do believe this _is_ evidence in favour of the theory:

top - 09:03:57 up 2:16, 2 users, load average: 129.27, 49.28, 21.57
Tasks: 161 total, 1 running, 95 sleeping, 1 stopped, 64 zombie

:-)

#define _GNU_SOURCE

#include <sys/socket.h>
#include <sys/types.h>

#include <fcntl.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

static int sock_fd[2];
static int pipe_fd[2];

#define N 16384

static void *do_write(void *unused)
{
unsigned int i;

for (i = 0; i < N; ++i)
write(pipe_fd[1], "x", 1);

return NULL;
}

static void *do_read(void *unused)
{
unsigned int i;
char c;

for (i = 0; i < N; ++i)
read(sock_fd[0], &c, 1);

return NULL;
}

static void *do_splice(void *unused)
{
unsigned int i;

for (i = 0; i < N; ++i)
splice(pipe_fd[0], NULL, sock_fd[1], NULL, 1, 0);

return NULL;
}

int main(int argc, char *argv[])
{
pthread_t writer;
pthread_t reader;
pthread_t splicer[2];

while (1) {
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fd) == -1)
exit(EXIT_FAILURE);

if (pipe(pipe_fd) == -1)
exit(EXIT_FAILURE);

pthread_create(&writer, NULL, &do_write, NULL);
pthread_create(&reader, NULL, &do_read, NULL);
pthread_create(&splicer[0], NULL, &do_splice, NULL);
pthread_create(&splicer[1], NULL, &do_splice, NULL);

pthread_join(writer, NULL);
pthread_join(reader, NULL);
pthread_join(splicer[0], NULL);
pthread_join(splicer[1], NULL);

printf("failed to deadlock, retrying...\n");
}

return EXIT_SUCCESS;
}

$ gcc splice.c -lpthread
$ ./a.out &
$ ./a.out &
$ ./a.out &
(as many as you want; then wait for a bit -- ten seconds works for me)
$ killall -9 a.out
(not all will die -- those are now zombies)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/