Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw

From: Pavel Begunkov
Date: Mon Feb 21 2022 - 13:14:46 EST


On 2/21/22 14:16, Dylan Yudaken wrote:
In read/write ops, preincrement f_pos when no offset is specified, and
then attempt fix up the position after IO completes if it completed less
than expected. This fixes the problem where multiple queued up IO will all
obtain the same f_pos, and so perform the same read/write.

This is still not as consistent as sync r/w, as it is able to advance the
file offset past the end of the file. It seems it would be quite a
performance hit to work around this limitation - such as by keeping track
of concurrent operations - and the downside does not seem to be too
problematic.

The attempt to fix up the f_pos after will at least mean that in situations
where a single operation is run, then the position will be consistent.

Co-developed-by: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Dylan Yudaken <dylany@xxxxxx>
---
fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 68 insertions(+), 13 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index abd8c739988e..a951d0754899 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)

[...]

+ return false;
}
}
- return is_stream ? NULL : &kiocb->ki_pos;
+ *ppos = is_stream ? NULL : &kiocb->ki_pos;
+ return false;
+}
+
+static inline void
+io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64 actual)

That's a lot of inlining, I wouldn't be surprised if the compiler
will even refuse to do that.

io_kiocb_done_pos() {
// rest of it
}

inline io_kiocb_done_pos() {
if (!(flags & CUR_POS));
return;
__io_kiocb_done_pos();
}

io_kiocb_update_pos() is huge as well

+{
+ u64 expected;
+
+ if (likely(!(req->flags & REQ_F_CUR_POS)))
+ return;
+
+ expected = req->rw.len;
+ if (actual >= expected)
+ return;
+
+ /*
+ * It's not definitely safe to lock here, and the assumption is,
+ * that if we cannot lock the position that it will be changing,
+ * and if it will be changing - then we can't update it anyway
+ */
+ if (req->file->f_mode & FMODE_ATOMIC_POS
+ && !mutex_trylock(&req->file->f_pos_lock))
+ return;
+
+ /*
+ * now we want to move the pointer, but only if everything is consistent
+ * with how we left it originally
+ */
+ if (req->file->f_pos == kiocb->ki_pos + (expected - actual))
+ req->file->f_pos = kiocb->ki_pos;

I wonder, is it good enough / safe to just assign it considering that
the request was executed outside of locks? vfs_seek()?

+
+ /* else something else messed with f_pos and we can't do anything */
+
+ if (req->file->f_mode & FMODE_ATOMIC_POS)
+ mutex_unlock(&req->file->f_pos_lock);
}

Do we even care about races while reading it? E.g.
pos = READ_ONCE();

- ppos = io_kiocb_update_pos(req, kiocb);
-
ret = rw_verify_area(READ, req->file, ppos, req->result);
if (unlikely(ret)) {
kfree(iovec);
+ io_kiocb_done_pos(req, kiocb, 0);

Why do we update it on failure?

[...]

- ppos = io_kiocb_update_pos(req, kiocb);
-
ret = rw_verify_area(WRITE, req->file, ppos, req->result);
if (unlikely(ret))
goto out_free;
@@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
return ret ?: -EAGAIN;
}
out_free:
+ io_kiocb_done_pos(req, kiocb, 0);

Looks weird. It appears we don't need it on failure and
successes are covered by kiocb_done() / ->ki_complete

/* it's reportedly faster than delegating the null check to kfree() */
if (iovec)
kfree(iovec);

--
Pavel Begunkov