Re: Pending splice(file -> FIFO) always blocks read(FIFO), regardless of O_NONBLOCK on read side?

From: Christian Brauner
Date: Mon Jun 26 2023 - 05:32:28 EST


On Mon, Jun 26, 2023 at 03:12:09AM +0200, Ahelenia Ziemiańska wrote:
> Hi! (starting with get_maintainers.pl fs/splice.c,
> idk if that's right though)
>
> Per fs/splice.c:
> * The traditional unix read/write is extended with a "splice()" operation
> * that transfers data buffers to or from a pipe buffer.
> so I expect splice() to work just about the same as read()/write()
> (and, to a large extent, it does so).
>
> Thus, a refresher on pipe read() semantics
> (quoting Issue 8 Draft 3; Linux when writing with write()):
> 60746 When attempting to read from an empty pipe or FIFO:
> 60747 • If no process has the pipe open for writing, read( ) shall return 0 to indicate end-of-file.
> 60748 • If some process has the pipe open for writing and O_NONBLOCK is set, read( ) shall return
> 60749 −1 and set errno to [EAGAIN].
> 60750 • If some process has the pipe open for writing and O_NONBLOCK is clear, read( ) shall
> 60751 block the calling thread until some data is written or the pipe is closed by all processes that
> 60752 had the pipe open for writing.
>
> However, I've observed that this is not the case when splicing from
> something that sleeps on read to a pipe, and that in that case all
> readers block, /including/ ones that are reading from fds with
> O_NONBLOCK set!
>
> As an example, consider these two programs:
> -- >8 --
> // wr.c
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <stdio.h>
> int main() {
> while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0)
> ;
> fprintf(stderr, "wr: %m\n");
> }
> -- >8 --
>
> -- >8 --
> // rd.c
> #define _GNU_SOURCE
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
> int main() {
> fcntl(0, F_SETFL, fcntl(0, F_GETFL) | O_NONBLOCK);
>
> char buf[64 * 1024] = {};
> for (ssize_t rd;;) {
> #if 1
> while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR)
> ;
> #else
> while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) == -1 &&
> errno == EINTR)
> ;
> #endif
> fprintf(stderr, "rd=%zd: %m\n", rd);
> write(1, buf, rd);
>
> errno = 0;
> sleep(1);
> }
> }
> -- >8 --
>
> Thus:
> -- >8 --
> a$ make rd wr
> a$ mkfifo fifo
> a$ ./rd < fifo b$ echo qwe > fifo
> rd=4: Success
> qwe
> rd=0: Success
> rd=0: Success b$ sleep 2 > fifo
> rd=-1: Resource temporarily unavailable
> rd=-1: Resource temporarily unavailable
> rd=0: Success
> rd=0: Success
> rd=-1: Resource temporarily unavailable b$ /bin/cat > fifo
> rd=-1: Resource temporarily unavailable
> rd=4: Success abc
> abc
> rd=-1: Resource temporarily unavailable
> rd=4: Success def
> def
> rd=0: Success ^D
> rd=0: Success
> rd=0: Success b$ ./wr > fifo
> -- >8 --
> and nothing. Until you actually type a line (or a few) into teletype b
> so that the splice completes, at which point so does the read.
>
> An even simpler case is
> -- >8 --
> $ ./wr | ./rd
> abc
> def
> rd=8: Success
> abc
> def
> ghi
> jkl
> rd=8: Success
> ghi
> jkl
> ^D
> wr: Success
> rd=-1: Resource temporarily unavailable
> rd=0: Success
> rd=0: Success
> -- >8 --
>
> splice flags don't do anything.
> Tested on bookworm (6.1.27-1) and Linus' HEAD (v6.4-rc7-234-g547cc9be86f4).
>
> You could say this is a "denial of service", since this is a valid
> way of following pipes (and, sans SIGIO, the only portable one),

splice() may block for any of the two file descriptors if they don't
have O_NONBLOCK set even if SPLICE_F_NONBLOCK is raised.

SPLICE_F_NONBLOCK in splice_file_to_pipe() is only relevant if the pipe
is full. If the pipe isn't full then the write is attempted. That of
course involves reading the data to splice from the source file. If the
source file isn't O_NONBLOCK that read may block holding pipe_lock().

If you raise O_NONBLOCK on the source fd in wr.c then your problems go
away. This is pretty long-standing behavior. Splice would have to be
refactored to not rely on pipe_lock(). That's likely major work with a
good portion of regressions if the past is any indication.

If you need that ability to fully async read from a pipe with splice
rn then io_uring will at least allow you to punt that read into an async
worker thread afaict.