Re: WARNING in percpu_ref_kill_and_confirm

From: Linus Torvalds
Date: Mon Apr 22 2019 - 12:32:28 EST


On Mon, Apr 22, 2019 at 9:06 AM syzbot
<syzbot+10d25e23199614b7721f@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>
> The bug was bisected to:
>
> commit 38e7571c07be01f9f19b355a9306a4e3d5cb0f5b
> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date: Fri Mar 8 22:48:40 2019 +0000
>
> Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block
>
> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!

So I don't see how that happens in the original code (because
__io_uring_register() is called with the uring_lock held), but let's
see.

HOWEVER.

I do see how it happens now as of the latest kernel as of commit
b19062a56726 ("io_uring: fix possible deadlock between
io_uring_{enter,register}") where the code explicitly drops the mutex
in order to wait for other uring users to finish.

So Jens, I think that commit was buggy. I suspect that
io_uring_register() should perhaps do something like

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2934,7 +2934,10 @@ static int __io_uring_register(struct
io_ring_ctx *ctx, unsigned opcode,
{
int ret;

+ if (!percpu_ref_tryget(&ctx->refs))
+ return -EBUSY;
percpu_ref_kill(&ctx->refs);
+ percpu_ref_put(&ctx->refs);

/*
* Drop uring mutex before waiting for references to exit. If another

to guarantee that it's the *only* case of io_uring_register() doing that kill.

Hmm?

Linus