Re: [External] Re: [PATCH v3 2/3] io_uring: avoid ring quiesce while registering/unregistering eventfd

From: Usama Arif
Date: Thu Feb 03 2022 - 13:26:47 EST




On 03/02/2022 17:56, Jens Axboe wrote:
On 2/3/22 10:41 AM, Usama Arif wrote:
@@ -1726,13 +1732,24 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
return &rings->cqes[tail & mask];
}
-static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
+static void io_eventfd_signal(struct io_ring_ctx *ctx)
{
- if (likely(!ctx->cq_ev_fd))
- return false;
+ struct io_ev_fd *ev_fd;
+
+ rcu_read_lock();
+ /* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */
+ ev_fd = rcu_dereference(ctx->io_ev_fd);
+
+ if (likely(!ev_fd))
+ goto out;
if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
- return false;
- return !ctx->eventfd_async || io_wq_current_is_worker();
+ goto out;
+
+ if (!ctx->eventfd_async || io_wq_current_is_worker())
+ eventfd_signal(ev_fd->cq_ev_fd, 1);
+
+out:
+ rcu_read_unlock();
}

Like Pavel pointed out, we still need the fast path (of not having an
event fd registered at all) to just do the cheap check and not need rcu
lock/unlock. Outside of that, I think this looks fine.


Hmm, maybe i didn't understand you and Pavel correctly. Are you suggesting to do the below diff over patch 3? I dont think that would be correct, as it is possible that just after checking if ctx->io_ev_fd is present unregister can be called by another thread and set ctx->io_ev_fd to NULL that would cause a NULL pointer exception later? In the current patch, the check of whether ev_fd exists happens as the first thing after rcu_read_lock and the rcu_read_lock are extremely cheap i believe.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 25ed86533910..0cf282fba14d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1736,12 +1736,13 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx)
{
struct io_ev_fd *ev_fd;

+ if (likely(!ctx->io_ev_fd))
+ return;
+
rcu_read_lock();
/* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */
ev_fd = rcu_dereference(ctx->io_ev_fd);

- if (likely(!ev_fd))
- goto out;
if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
goto out;




static int io_eventfd_unregister(struct io_ring_ctx *ctx)
{
- if (ctx->cq_ev_fd) {
- eventfd_ctx_put(ctx->cq_ev_fd);
- ctx->cq_ev_fd = NULL;
- return 0;
+ struct io_ev_fd *ev_fd;
+ int ret;
+
+ mutex_lock(&ctx->ev_fd_lock);
+ ev_fd = rcu_dereference_protected(ctx->io_ev_fd, lockdep_is_held(&ctx->ev_fd_lock));
+ if (!ev_fd) {
+ ret = -ENXIO;
+ goto out;
}
+ synchronize_rcu();
+ eventfd_ctx_put(ev_fd->cq_ev_fd);
+ kfree(ev_fd);
+ rcu_assign_pointer(ctx->io_ev_fd, NULL);
+ ret = 0;
- return -ENXIO;
+out:
+ mutex_unlock(&ctx->ev_fd_lock);
+ return ret;
}

synchronize_rcu() can take a long time, and I think this is in the wrong
spot. It should be on the register side, IFF we need to expedite the
completion of a previous event fd unregistration. If we do it that way,
at least it'll only happen if it's necessary. What do you think?



How about the approach in v4? so switching back to call_rcu as in v2 and if ctx->io_ev_fd is NULL then we call rcu_barrier to make sure all rcu callbacks are finished and check for NULL again.

Thanks!
Usama