Re: [PATCH] io_uring: Optimization of buffered random write

From: Jens Axboe
Date: Wed Apr 19 2023 - 09:33:07 EST


On 4/19/23 3:22?AM, luhongfei wrote:
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 4a865f0e85d0..64bb91beb4d6
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2075,8 +2075,23 @@ static inline void io_queue_sqe(struct io_kiocb *req)
> __must_hold(&req->ctx->uring_lock)
> {
> int ret;
> + bool is_write;
>
> - ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> + switch (req->opcode) {
> + case IORING_OP_WRITEV:
> + case IORING_OP_WRITE_FIXED:
> + case IORING_OP_WRITE:
> + is_write = true;
> + break;
> + default:
> + is_write = false;
> + break;
> + }
> +
> + if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT))
> + ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> + else
> + ret = io_issue_sqe(req, 0);
>
> /*
> * We async punt it if the file wasn't marked NOWAIT, or if the file

We really can't just do that, implicitly. What you are doing is making
any of write synchronous. What are you writing to in terms of device or
file? If file, what file system is being used? Curious if the target
supports async buffered writes, guessing it does not which is why you
see io-wq activity for all of them.

That said, I did toss out a test patch a while back that explicitly sets
up the ring such that we'll do blocking IO rather than do a non-blocking
attempt and then punt it if that fails. And I do think there's a use
case for that, in case you just want to use io_uring for batched
syscalls and don't care about if you end up blocking for some IO.

Let's do a primer on what happens for io_uring issue:

1) Non-blocking issue is attempted for IO. If successful, we're done for
now.

2) Case 1 failed. Now we have two options
a) We can poll the file. We arm poll, and we're done for now
until that triggers.
b) File cannot be polled, we punt to io-wq which then does a
blocking attempt.

For case 2b, this is the one where we could've just done a blocking
attempt initially if the ring was setup with a flag explicitly saying
that's what the application wants. Or io_uring_enter() had a flag passed
in that explicitly said this is what the applications wants. I suspect
we'll want both, to cover both SQPOLL and !SQPOLL.

I'd recommend we still retain non-blocking issue for pollable files, as
you could very quickly block forever otherwise. Imagine an empty pipe
and a read issued to it in the blocking mode.

A solution like that would cater to your case too, without potentially
breaking a lot of things like your patch could. The key here is the
explicit nature of it, we cannot just go and make odd assumptions about
a particular opcode type (writes) and ring type (SQPOLL) and say "oh
this one is fine for just ignoring blocking off the issue path".

--
Jens Axboe