Re: [External] Re: [PATCH v4 2/3] io_uring: avoid ring quiesce while registering/unregistering eventfd

From: Usama Arif
Date: Thu Feb 03 2022 - 18:37:26 EST




On 03/02/2022 19:12, Jens Axboe wrote:
On 2/3/22 12:05 PM, Usama Arif wrote:


On 03/02/2022 18:49, Jens Axboe wrote:
On 2/3/22 11:24 AM, Usama Arif wrote:
-static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
+static void io_eventfd_signal(struct io_ring_ctx *ctx)
{
- if (likely(!ctx->cq_ev_fd))
- return false;
+ struct io_ev_fd *ev_fd;
+
+ rcu_read_lock();
+ /* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */
+ ev_fd = rcu_dereference(ctx->io_ev_fd);
+
+ if (likely(!ev_fd))
+ goto out;
if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
- return false;
- return !ctx->eventfd_async || io_wq_current_is_worker();
+ goto out;
+
+ if (!ctx->eventfd_async || io_wq_current_is_worker())
+ eventfd_signal(ev_fd->cq_ev_fd, 1);
+
+out:
+ rcu_read_unlock();
}

This still needs what we discussed in v3, something ala:

/*
* This will potential race with eventfd registration, but that's
* always going to be the case if there is IO inflight while an eventfd
* descriptor is being registered.
*/
if (!rcu_dereference_raw(ctx->io_ev_fd))
return;

rcu_read_lock();

Hmm, so i am not so worried about the registeration, but actually
worried about unregisteration.
If after the check and before the rcu_read_lock, the eventfd is
unregistered won't we get a NULL pointer exception at
eventfd_signal(ev_fd->cq_ev_fd, 1)?

You need to check it twice, that's a hard requirement. The first racy
check is safe because we don't care if we miss a notification, once
inside rcu_read_lock() it needs to be done properly of course. Like you
do below, that's how it should be done.

I wonder if we can get away with assigning ctx->io_ev_fd to NULL when we
do the call_rcu(). The struct itself will remain valid as long as we're
under rcu_read_lock() protection, so I think we'd be fine? If we do
that, then we don't need any rcu_barrier() or synchronize_rcu() calls,
as we can register a new one while the previous one is still being
killed.

Hmm?


We would have to remove the check that ctx->io_ev_fd != NULL. That we
would also result in 2 successive calls to io_eventfd_register without
any unregister in between being successful? Which i dont think is the
right behaviour?

I think the likelihood of hitting the rcu_barrier itself is quite low,
so probably the cost is low as well.

Yeah it might very well be. To make what I suggested work, we'd need a
way to mark the io_ev_fd as going away. Which would be feasible, as we
know the memory will remain valid for us to check. So it could
definitely work, you'd just need a check for that.

Thanks, will do that this in the next patchset with the above
io_eventfd_signal changes if those look ok as well?

The code you pasted looked good. Consider the "is unregistration in
progress" suggestion as well, as it would be nice to avoid any kind of
rcu synchronization if at all possible.



Thanks for the review comments! I think all of them should have been addressed now in v5. I also removed ring quiesce from io_uring_register as the remaining 2 opcodes don't need them (Thanks Pavel for confirming that!)

Regards,
Usama