[PATCH] pipe: Make a partially-satisfied blocking read wait for more

From: David Howells
Date: Fri Jun 23 2023 - 18:35:23 EST


Hi Linus,

Can you consider merging something like the attached patch? Unfortunately,
there are applications out there that depend on a read from pipe() waiting
until the buffer is full under some circumstances. Patch a28c8b9db8a1
removed the conditionality on there being an attached writer.

I'm not sure this is the best solution though as it goes over the other way
and will now block reads for which there isn't an active writer - and I'm
sure that, somewhere, there's an app that will break on tht.

Thanks,
David
---
pipe: Make a partially-satisfied blocking read wait for more data

A read on a pipe may return short after reading some data from a pipe, even
though the pipe isn't non-blocking. This is stated in the read(2) manual
page:

... It is not an error if this number is smaller than the number of
bytes requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal)...

However, some applications depend on a blocking read on a pipe not
returning until it fills the buffer unless it hits EOF or a signal occurs -
at least as long as there's an active writer on the other end.

Fix the pipe reader to restore this behaviour by only breaking out with a
short read in the non-block (and signal) cases.

Here's a reproducer for it:

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/uio.h>

#define F_GETPIPE_SZ 1032

int main(int argc, char *argv[])
{
int fildes[2];
if (pipe(fildes) == -1) {
perror("in pipe");
return -1;
}
printf("%d %d\n",
fcntl(fildes[0], F_GETPIPE_SZ),
fcntl(fildes[1], F_GETPIPE_SZ));
if (fork() != 0) {
void *tata = malloc(100000);
int res = read(fildes[0], tata, 100000);
printf("could read %d bytes\n", res);
return -1;
}
void *toto = malloc(100000);
struct iovec iov;
iov.iov_base = toto;
iov.iov_len = 100000;
int d = writev(fildes[1], &iov, 1);
if (d == -1) {
perror("in writev");
return -1;
}
printf("could write %d bytes\n", d);
sleep(1);
return 0;
}

It should show the same amount read as written, but shows a short read because
the pipe capacity isn't sufficient.

Fixes: a28c8b9db8a1 ("pipe: remove 'waiting_writers' merging logic")
Reported-by: Franck Grosjean <fgrosjea@xxxxxxxxxx>
Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
Tested-by: Phil Auld <pauld@xxxxxxxxxx>
cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
cc: Christian Brauner <brauner@xxxxxxxxxx>
cc: linux-fsdevel@xxxxxxxxxxxxxxx
---
fs/pipe.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 2d88f73f585a..c5c992f19d28 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -340,11 +340,10 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)

if (!pipe->writers)
break;
- if (ret)
- break;
if ((filp->f_flags & O_NONBLOCK) ||
(iocb->ki_flags & IOCB_NOWAIT)) {
- ret = -EAGAIN;
+ if (!ret)
+ ret = -EAGAIN;
break;
}
__pipe_unlock(pipe);