Re: Pending splice(file -> FIFO) excludes all other FIFO operations forever (was: ... always blocks read(FIFO), regardless of O_NONBLOCK on read side?)

From: Ahelenia Ziemiańska
Date: Fri Jul 07 2023 - 18:41:35 EST


On Fri, Jul 07, 2023 at 12:10:36PM -0700, Linus Torvalds wrote:
> On Fri, 7 Jul 2023 at 10:21, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > Forgot to say, fwiw, I've been running this through the LTP splice,
> > pipe, and ipc tests without issues. A hanging reader can be signaled
> > away cleanly with this.
> NOTE! NOTE! NOTE! Once more, this "feels right to me", and I'd argue
> that the basic approach is fairly straightfoward. The patch is also
> not horrendous. It all makes a fair amount of sense. BUT! I haven't
> tested this, and like the previous patch, I really would want people
> to think about this a lot.
>
> Comments? Jens?
I applied the patch upthread + this diff to 4f6b6c2b2f86b7878a770736bf478d8a263ff0bc;
during test setup I got a null deref (building defconfig minus graphics).
Reproducible, full BUG dump attached; trace of
[ 149.878931] <TASK>
[ 149.879533] ? __die+0x1e/0x60
[ 149.880309] ? page_fault_oops+0x17c/0x470
[ 149.881313] ? search_module_extables+0x14/0x50
[ 149.882422] ? exc_page_fault+0x67/0x150
[ 149.883397] ? asm_exc_page_fault+0x26/0x30
[ 149.884426] ? __pfx_pipe_to_null+0x10/0x10
[ 149.885451] ? splice_from_pipe_next+0x129/0x150
[ 149.886580] __splice_from_pipe+0x39/0x1c0
[ 149.887594] ? __pfx_pipe_to_null+0x10/0x10
[ 149.888615] ? __pfx_pipe_to_null+0x10/0x10
[ 149.889635] splice_from_pipe+0x5c/0x90
[ 149.890579] do_splice+0x35c/0x840
[ 149.891407] __do_splice+0x1eb/0x210
[ 149.892176] __x64_sys_splice+0xad/0x120
[ 149.893019] do_syscall_64+0x3e/0x90
[ 149.893798] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

$ scripts/faddr2line vmlinux splice_from_pipe_next+0x129
splice_from_pipe_next+0x129/0x150:
pipe_buf_release at include/linux/pipe_fs_i.h:221
(inlined by) eat_empty_buffer at fs/splice.c:594
(inlined by) splice_from_pipe_next at fs/splice.c:640

I gamed this down to
echo c | grep c >/dev/null
where grep is
ii grep 3.8-5 amd64 GNU grep, egrep and fgrep
and strace of the same invocation (on the host) ends with
newfstatat(1, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0
newfstatat(AT_FDCWD, "/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, 0) = 0
newfstatat(0, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
read(0, "c\n", 98304) = 2
splice(0, NULL, 1, NULL, 98304, SPLICE_F_MOVE) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++

And can also reproduce it with
echo | { read -r _; exec ./wr; } > /dev/null
(where ./wr is "while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0) {}").
However:
echo | ./wr > /dev/null
does NOT crash.


Besides that, this doesn't solve the original issue, inasmuch as
./v > fifo &
head fifo &
echo zupa > fifo
(where ./v splices from an empty pty to stdout; v.c attached)
echo still sleeps until ./v dies, though it also succumbs to ^C now.

"OTOH, on 4f6b6c2b2f86b7878a770736bf478d8a263ff0bc,
"timeout 10 ./v > fifo &" (then lines 2 and 3 as above) does
kill ./v -> unblock echo -> head copies "zupa",
i.e. life resumes as normal after the splicer went away.

With the patches, echo zupa is stuck forever (until you signal it)!
This is kinda worse.
[ 149.843966] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 149.845820] #PF: supervisor read access in kernel mode
[ 149.847190] #PF: error_code(0x0000) - not-present page
[ 149.848540] PGD 0 P4D 0
[ 149.849231] Oops: 0000 [#1] PREEMPT SMP PTI
[ 149.850345] CPU: 0 PID: 230 Comm: grep Not tainted 6.4.0-12317-gabf530ed3e36-dirty #3
[ 149.852411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 149.854900] RIP: 0010:splice_from_pipe_next+0x129/0x150
[ 149.856328] Code: ff c6 45 38 00 eb af 5b b8 00 fe ff ff 5d 41 5c 41 5d c3 cc cc cc cc 48 8b 46 10 41 83 c5 01 48 89 df 48 c7 46 10 00 00 00 00 <48> 8b 40 08 e8 ce a5 9a
[ 149.861118] RSP: 0018:ffffb2ed40347d70 EFLAGS: 00010202
[ 149.862488] RAX: 0000000000000000 RBX: ffff8c06c1d9a0c0 RCX: 0000000000000000
[ 149.864357] RDX: 0000000000000005 RSI: ffff8c06c8c98028 RDI: ffff8c06c1d9a0c0
[ 149.866217] RBP: ffffb2ed40347de0 R08: 0000000000000001 R09: ffffffffaa428db0
[ 149.868088] R10: 0000000000018000 R11: 0000000000000000 R12: ffff8c06c2625580
[ 149.869950] R13: 0000000000000002 R14: ffff8c06c1d9a0c0 R15: ffffb2ed40347de0
[ 149.871828] FS: 00007fa5a6b3e740(0000) GS:ffff8c06dd800000(0000) knlGS:0000000000000000
[ 149.873937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 149.875459] CR2: 0000000000000008 CR3: 000000000269a000 CR4: 00000000000006f0
[ 149.877327] Call Trace:
[ 149.878931] <TASK>
[ 149.879533] ? __die+0x1e/0x60
[ 149.880309] ? page_fault_oops+0x17c/0x470
[ 149.881313] ? search_module_extables+0x14/0x50
[ 149.882422] ? exc_page_fault+0x67/0x150
[ 149.883397] ? asm_exc_page_fault+0x26/0x30
[ 149.884426] ? __pfx_pipe_to_null+0x10/0x10
[ 149.885451] ? splice_from_pipe_next+0x129/0x150
[ 149.886580] __splice_from_pipe+0x39/0x1c0
[ 149.887594] ? __pfx_pipe_to_null+0x10/0x10
[ 149.888615] ? __pfx_pipe_to_null+0x10/0x10
[ 149.889635] splice_from_pipe+0x5c/0x90
[ 149.890579] do_splice+0x35c/0x840
[ 149.891407] __do_splice+0x1eb/0x210
[ 149.892176] __x64_sys_splice+0xad/0x120
[ 149.893019] do_syscall_64+0x3e/0x90
[ 149.893798] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 149.894881] RIP: 0033:0x7fa5a6c49dd3
[ 149.895682] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d 11 18 0d 00 00 49 89 ca 74 14 b8 13 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 74
[ 149.899538] RSP: 002b:00007ffc83d77768 EFLAGS: 00000202 ORIG_RAX: 0000000000000113
[ 149.901116] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa5a6c49dd3
[ 149.902602] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[ 149.904048] RBP: 0000564d8aaeb000 R08: 0000000000018000 R09: 0000000000000001
[ 149.905439] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000a
[ 149.906832] R13: 0000564d8aaeb010 R14: 0000564d8aaeb000 R15: 0000000000000000
[ 149.908239] </TASK>
[ 149.908692] Modules linked in:
[ 149.909326] CR2: 0000000000000008
[ 149.910050] ---[ end trace 0000000000000000 ]---
[ 149.910986] RIP: 0010:splice_from_pipe_next+0x129/0x150
[ 149.912063] Code: ff c6 45 38 00 eb af 5b b8 00 fe ff ff 5d 41 5c 41 5d c3 cc cc cc cc 48 8b 46 10 41 83 c5 01 48 89 df 48 c7 46 10 00 00 00 00 <48> 8b 40 08 e8 ce a5 9a
[ 149.915639] RSP: 0018:ffffb2ed40347d70 EFLAGS: 00010202
[ 149.916589] RAX: 0000000000000000 RBX: ffff8c06c1d9a0c0 RCX: 0000000000000000
[ 149.917877] RDX: 0000000000000005 RSI: ffff8c06c8c98028 RDI: ffff8c06c1d9a0c0
[ 149.919172] RBP: ffffb2ed40347de0 R08: 0000000000000001 R09: ffffffffaa428db0
[ 149.920457] R10: 0000000000018000 R11: 0000000000000000 R12: ffff8c06c2625580
[ 149.921737] R13: 0000000000000002 R14: ffff8c06c1d9a0c0 R15: ffffb2ed40347de0
[ 149.923021] FS: 00007fa5a6b3e740(0000) GS:ffff8c06dd800000(0000) knlGS:0000000000000000
[ 149.924481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 149.925529] CR2: 0000000000000008 CR3: 000000000269a000 CR4: 00000000000006f0


#define _GNU_SOURCE
#include <fcntl.h>
#include <stdlib.h>
#include <sys/sendfile.h>

int main() {
int pt = posix_openpt(O_RDWR);
grantpt(pt);
unlockpt(pt);
int cl = open(ptsname(pt), O_RDONLY);
for(;;)
splice(cl, 0, 1, 0, 128 * 1024 * 1024, 0);
// sendfile(1, 0, 0, 128 * 1024 * 1024);
}

Attachment: signature.asc
Description: PGP signature