strace, accept(), ERESTARTSYS and EINTR

From: Phil Endecott
Date: Fri Jan 04 2008 - 16:41:32 EST


Dear Experts,

I have some code like this:

struct sockaddr_in client_addr;
socklen_t client_size=sizeof(client_addr);
int connfd = accept(fd,(struct sockaddr*)(&client_addr),&client_size);
if (connfd==-1) {
// [1]
.....report error and terminate......
}
int rc = fcntl(connfd,F_SETFD,FD_CLOEXEC);


I believe that I should be checking for errno==EINTR at [1] and retrying the accept(); currently I'm not doing so.

When I strace -f this application - which is multi-threaded - I see this:

[pid 11079] accept(3, <unfinished ...>
[pid 11093] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 8799] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid 11079] <... accept resumed> 0xbfdaa73c, [16]) = ? ERESTARTSYS (To be restarted)
[pid 8799] read(6, <unfinished ...>
[pid 11079] fcntl64(-512, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)

This shows accept() "returning" ERESTARTSYS; as I understand it this is an artefact of how strace works, and my code will not have seen accept return at all at that point. However, the strace output does not show any other return from the call to accept() before reporting that thread's call to fcntl(). And the first parameter to fcntl, -512, is the return value from accept() which should be -1 or >0. What is going on here???

Google found a couple of related reports:

http://lkml.org/lkml/2001/11/22/65 - Phil Howard reports getting ERESTARTSYS returned from accept(), not only in the strace output, and fixed his problem by treating it like EINTR. He looked at errno if accept() returned <0, not ==-1.

http://lkml.org/lkml/2005/9/20/135 - Peter Duellings reports seeing accept() return -512 with errno==0.


Some more background: I started stracing the program because it was already misbehaving. It had created a larger than usual number of threads (30?). It's quite possible that some process resource limit had been reached; could this have confused the glibc syscall wrapper, causing it to return the mysterious -512? Could it have confused the kernel into returng ERESTARTSYS instead of e.g. E-too-many-sockets-open?

It's also possible that some random memory corruption had occurred. valgrind is on my to-do list. But even if this had occurred, I would expect to see strace reporting a legitimate return from accept() before the call to fcntl(); there was no sign of any global resource limit that might affect strace occurring.

This is a Debian system running kernel 2.6.21-1 i686, glibc 2.7, and strace 4.5.14.


Many thanks for any suggestions.

Phil.

(You're welcome to Cc: me in any replies.)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/