[patch] nonempty pipe does not select for write

stano@trillian.eunet.sk
Sun, 19 Dec 1999 11:50:40 +0100 (CET)


Hello,

I am repeating the main points of a discussion in
a local maling list (for a demonstrating program
see my mail to l-k of 26. 9. 1999 with subject
"nonempty pipe does not select for write").

In the current pipe implementation there is no possibility
to write to a pipe filedescriptor, when there is something
in the pipe that was not read. This is bad, when the
connected processes do some signalling via the pipe -
when the writing process writes a single byte, it
can not proceed until the byte is read by the peer.

The theory is that if the PIPE_BUF is set to the actual
kernel buffer size for the pipe, it is in fact impossible
to do anything else - the pipe writes up to PIPE_BUF
are guaranteed to be atomic and if the descriptor selects
for write, it should be guaranteed that the next write
does not block. As the amount of the data in the send
is not known at the time of select, it is impossible
to let the user do a write when the remaining size of
the kernel buffer is less than PIPE_BUF.

The kernel pipe code is actually prepared for changing
this and differentiates between PIPE_BUF and PIPE_SIZE.
I am appending a patch against 2.3.33 that allocates
2 * PIPE_BUF and allows a poll for write when there
is at least PIPE_BUF space free. Could someone please
review it if it makes sense? - it works for me, but
I am no kernel expert and do not know the exact semantics
behind kernel poll routines etc.

For a simple test program sending one byte at a time the
context switch rate went down approximately ten times.

Regards

-- 
				Stano

--- linux/include/linux/pipe_fs_i.h.orig Sun Nov 21 20:17:45 1999 +++ linux/include/linux/pipe_fs_i.h Sat Dec 18 16:30:34 1999 @@ -13,7 +13,7 @@ /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual memory allocation, whereas PIPE_BUF makes atomicity guarantees. */ -#define PIPE_SIZE PAGE_SIZE +#define PIPE_SIZE 2*PAGE_SIZE #define PIPE_SEM(inode) (&(inode).i_sem) #define PIPE_WAIT(inode) (&(inode).i_pipe->wait) --- linux/fs/pipe.c.orig Sun Dec 5 17:42:03 1999 +++ linux/fs/pipe.c Sun Dec 19 11:19:59 1999 @@ -304,9 +304,11 @@ poll_wait(filp, PIPE_WAIT(*inode), wait); /* Reading only -- no need for aquiring the semaphore. */ - mask = POLLIN | POLLRDNORM; - if (PIPE_EMPTY(*inode)) - mask = POLLOUT | POLLWRNORM; + mask = 0; + if (!PIPE_EMPTY(*inode)) + mask |= POLLIN | POLLRDNORM; + if (PIPE_FREE(*inode) >= PIPE_BUF) + mask |= POLLOUT | POLLWRNORM; if (!PIPE_WRITERS(*inode)) mask |= POLLHUP; if (!PIPE_READERS(*inode)) @@ -329,9 +331,11 @@ poll_wait(filp, PIPE_WAIT(*inode), wait); /* Reading only -- no need for aquiring the semaphore. */ - mask = POLLIN | POLLRDNORM; - if (PIPE_EMPTY(*inode)) - mask = POLLOUT | POLLWRNORM; + mask = 0; + if (!PIPE_EMPTY(*inode)) + mask |= POLLIN | POLLRDNORM; + if (PIPE_FREE(*inode) >= PIPE_BUF) + mask |= POLLOUT | POLLWRNORM; if (!PIPE_READERS(*inode)) mask |= POLLERR; @@ -390,7 +394,7 @@ if (!PIPE_READERS(*inode) && !PIPE_WRITERS(*inode)) { struct pipe_inode_info *info = inode->i_pipe; inode->i_pipe = NULL; - free_page((unsigned long) info->base); + kfree(info->base); kfree(info); } else { wake_up_interruptible(PIPE_WAIT(*inode)); @@ -568,7 +572,7 @@ if (!inode) goto fail_inode; - page = __get_free_page(GFP_USER); + page = (unsigned long) kmalloc(PIPE_SIZE, GFP_USER); if (!page) goto fail_iput;

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/