Re: [GIT PULL] pipe: nonblocking rw for io_uring

From: Jens Axboe
Date: Mon Apr 24 2023 - 18:45:13 EST


On 4/24/23 4:07?PM, Jens Axboe wrote:
> On 4/24/23 3:58?PM, Linus Torvalds wrote:
>> On Mon, Apr 24, 2023 at 2:37?PM Linus Torvalds
>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> And I completely refuse to add that trylock hack to paper that over.
>>> The pipe lock is *not* meant for IO.
>>
>> If you want to paper it over, do it other ways.
>>
>> I'd love to just magically fix splice, but hey, that might not be possible.
>
> Don't think it is... At least not trivially.
>
>> But possible fixes papering this over might be to make splice "poison
>> a pipe, and make io_uring falls back on io workers only on pipes that
>> do splice. Make any normal pipe read/write load sane.
>>
>> And no, don't worry about races. If you have the same pipe used for
>> io_uring IO *and* somebody else then doing splice on it and racing,
>> just take the loss and tell people that they might hit a slow case if
>> they do stupid things.
>>
>> Basically, the patch might look like something like
>>
>> - do_pipe() sets FMODE_NOWAIT by default when creating a pipe
>>
>> - splice then clears FMODE_NOWAIT on pipes as they are used
>>
>> and now io_uring sees whether the pipe is playing nice or not.
>>
>> As far as I can tell, something like that would make the
>> 'pipe_buf_confirm()' part unnecessary too, since that's only relevant
>> for splice.
>>
>> A fancier version might be to only do that "splice then clears
>> FMODE_NOWAIT" thing if the other side of the splice has not set
>> FMODE_NOWAIT.
>>
>> Honestly, if the problem is "pipe IO is slow", then splice should not
>> be the thing you optimize for.
>
> I think that'd be an acceptable approach, and would at least fix the
> pure pipe case which I suspect is 99.9% of them, if not more. And yes,
> it'd mean that we don't need to do the ->confirm() change either, as the
> pipe is already tainted at that point.
>
> I'll respin a v2, post, and send in later this merge window.

Something like this. Not tested yet, but wanted to get your feedback
early to avoid issues down the line. Really just the first hunk, as
we're not really expecting f_mode to change and hence could just use
an xchg(). Seems saner to use a cmpxchg though, but maybe that's
too much of both belt and suspenders?

commit f10cbebf4dd7f77b0e87c77bb0191a5a07d5e7ac
Author: Jens Axboe <axboe@xxxxxxxxx>
Date: Mon Apr 24 16:32:55 2023 -0600

splice: clear FMODE_NOWAIT on file if splice/vmsplice is used

In preparation for pipes setting FMODE_NOWAIT on pipes to indicate that
RWF_NOWAIT/IOCB_NOWAIT is fully supported, have splice and vmsplice
clear that file flag. Splice holds the pipe lock around IO and cannot
easily be refactored to avoid that, as splice and pipes are inherently
tied together.

By clearing FMODE_NOWAIT if splice is being used on a pipe, other users
of the pipe will know that the pipe is no longer safe for RWF_NOWAIT
and friends.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

diff --git a/fs/splice.c b/fs/splice.c
index 2c3dec2b6dfa..3ce7c07621e2 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -37,6 +37,29 @@

#include "internal.h"

+/*
+ * Splice doesn't support FMODE_NOWAIT. Since pipes may set this flag to
+ * indicate they support non-blocking reads or writes, we must clear it
+ * here if set to avoid blocking other users of this pipe if splice is
+ * being done on it.
+ */
+static void pipe_clear_nowait(struct file *file)
+{
+ fmode_t new_fmode, old_fmode;
+
+ /*
+ * We could get by with just doing an xchg() here as f_mode should
+ * not change in the file's lifetime, but let's be safe and use
+ * a cmpxchg() instead.
+ */
+ do {
+ old_fmode = READ_ONCE(file->f_mode);
+ if (!(old_fmode & FMODE_NOWAIT))
+ break;
+ new_fmode = old_fmode & ~FMODE_NOWAIT;
+ } while (!try_cmpxchg(&file->f_mode, &old_fmode, new_fmode));
+}
+
/*
* Attempt to steal a page from a pipe buffer. This should perhaps go into
* a vm helper function, it's already simplified quite a bit by the
@@ -1211,10 +1234,16 @@ static long __do_splice(struct file *in, loff_t __user *off_in,
ipipe = get_pipe_info(in, true);
opipe = get_pipe_info(out, true);

- if (ipipe && off_in)
- return -ESPIPE;
- if (opipe && off_out)
- return -ESPIPE;
+ if (ipipe) {
+ if (off_in)
+ return -ESPIPE;
+ pipe_clear_nowait(in);
+ }
+ if (opipe) {
+ if (off_out)
+ return -ESPIPE;
+ pipe_clear_nowait(out);
+ }

if (off_out) {
if (copy_from_user(&offset, off_out, sizeof(loff_t)))
@@ -1311,6 +1340,8 @@ static long vmsplice_to_user(struct file *file, struct iov_iter *iter,
if (!pipe)
return -EBADF;

+ pipe_clear_nowait(file);
+
if (sd.total_len) {
pipe_lock(pipe);
ret = __splice_from_pipe(pipe, &sd, pipe_to_user);
@@ -1339,6 +1370,8 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
if (!pipe)
return -EBADF;

+ pipe_clear_nowait(file);
+
pipe_lock(pipe);
ret = wait_for_space(pipe, flags);
if (!ret)

--
Jens Axboe