two pipe bugfixes

From: Andrea Arcangeli
Date: Sun Feb 27 2005 - 23:27:04 EST


Hello,

This testcase from Thomas Crhak:

#include <unistd.h>
#include <sys/select.h>

int
main(int argv, char **argc) {
int inout[2];
fd_set readfds, writefds, exceptfds;
struct timeval timeout;

timeout.tv_sec = 0;
timeout.tv_usec = 0;

FD_ZERO(&readfds);
FD_ZERO(&writefds);
FD_ZERO(&exceptfds);

pipe(inout);
write(inout[1], "qqqqqqqq", 5);

FD_SET(inout[1], &readfds);
FD_SET(inout[1], &writefds);
FD_SET(inout[1], &exceptfds);

select(inout[1] + 1, &readfds, &writefds, &exceptfds, &timeout);

close(inout[0]);

write(inout[1], "qqqqqqqq", 5);
return 0;
}

was returning this in 2.6.9:

pipe([3, 4]) = 0
write(4, "qqqqq", 5) = 5
select(5, [4], [4], [4], {0, 0}) = 1 (in [4], left {0, 0})
close(3) = 0
write(4, "qqqqq", 5) = -1 EPIPE (Broken pipe)

and it started to return this since 2.6.11-rc:

pipe([3, 4]) = 0
write(4, "qqqqq", 5) = 5
select(5, [4], [4], [4], {0, 0}) = 2 (in [4], out [4], left {0,
0})
close(3) = 0
write(4, "qqqqq", 5) = 5

There are two separate bugs: the first is that it doesn't return
-EPIPE because the input side has been closed already, the second is
that it returns POLLIN|POLLOUT. I heard POLLIN|POLLOUT is sometime
detected as a special case that means the input side has disconnected.
But anyway POLLIN|POLLOUT at the same time seems wrong to me (at least I
didn't find anything in SUS specs mentioning this magic for pipes) and
2.6.11-rc definitely breaks python-twisted (which apparently understand
the magic I've heard). But clearly the change in 2.6.11-rc that sets
POLLOUT if not all buffers are full makes plenty of sense for
performance reasons.

IMHO the really wrong thing is that we always set POLLIN (even for
output filedescriptors that will never allow any data to be read).

So now I check if the file is open in read or write mode, and I return
POLLIN or POLLOUT and never both of them at the same time. I've no idea
if this is the correct semantics that all applications expects, but
since we use the same pipe_poll callback for all read/write/rdwr cases,
I believe this is the most correct one and it certainly looks better
than the 2.6.11-rc code that breaks twisted, and this new logic still
allows the optimizations in the 2.6.11-rc to work.

Current output of strace with this patch applied is this:

pipe([3, 4]) = 0
write(4, "qqqqq", 5) = 5
select(5, [4], [4], [4], {0, 0}) = 1 (out [4], left {0, 0})
close(3) = 0
write(4, "qqqqq", 5) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++

which is different from 2.6.9 but it makes a lot more sense to me for a
"write-only" fd IMHO, and I had no pratical problems with this patch
yet (not that I've run any big stress test though, just normal misc
usage with all sort of desktop apps and twisted). Comments welcome thanks!

Patch is against 2.6.11-rc5.

Signed-off-by: Andrea Arcangeli <andrea@xxxxxxx>

--- xx/fs/pipe.c.~1~ 2005-02-28 00:43:42.000000000 +0100
+++ xx/fs/pipe.c 2005-02-28 04:47:26.000000000 +0100
@@ -235,6 +235,12 @@ pipe_writev(struct file *filp, const str
down(PIPE_SEM(*inode));
info = inode->i_pipe;

+ if (!PIPE_READERS(*inode)) {
+ send_sig(SIGPIPE, current, 0);
+ ret = -EPIPE;
+ goto out;
+ }
+
/* We try to merge small writes */
if (info->nrbufs && total_len < PAGE_SIZE) {
int lastbuf = (info->curbuf + info->nrbufs - 1) & (PIPE_BUFFERS-1);
@@ -398,8 +404,15 @@ pipe_poll(struct file *filp, poll_table

/* Reading only -- no need for acquiring the semaphore. */
nrbufs = info->nrbufs;
- mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
- mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
+ mask = 0;
+ /*
+ * Returning POLLIN|POLLOUT for a output channel has special semantics
+ * and it's used by some app to detect when the input has been closed.
+ */
+ if (filp->f_mode & FMODE_READ && nrbufs > 0)
+ mask |= POLLIN | POLLRDNORM;
+ if (filp->f_mode & FMODE_WRITE && nrbufs < PIPE_BUFFERS)
+ mask |= POLLOUT | POLLWRNORM;

if (!PIPE_WRITERS(*inode) && filp->f_version != PIPE_WCOUNTER(*inode))
mask |= POLLHUP;

PS. this still can return POLLIN|POLLOUT if somebody opens a fifo in RDWRITE
mode, but I guess that's ok and the above beahviour still should make sense
for such a corner case usage too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/