Re: WARNING in percpu_ref_kill_and_confirm

From: Jens Axboe
Date: Mon Apr 22 2019 - 12:28:37 EST


On 4/22/19 10:23 AM, Linus Torvalds wrote:
> On Mon, Apr 22, 2019 at 9:06 AM syzbot
> <syzbot+10d25e23199614b7721f@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>
>> The bug was bisected to:
>>
>> commit 38e7571c07be01f9f19b355a9306a4e3d5cb0f5b
>> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Date: Fri Mar 8 22:48:40 2019 +0000
>>
>> Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block
>>
>> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
>
> So I don't see how that happens in the original code (because
> __io_uring_register() is called with the uring_lock held), but let's
> see.
>
> HOWEVER.
>
> I do see how it happens now as of the latest kernel as of commit
> b19062a56726 ("io_uring: fix possible deadlock between
> io_uring_{enter,register}") where the code explicitly drops the mutex
> in order to wait for other uring users to finish.
>
> So Jens, I think that commit was buggy. I suspect that
> io_uring_register() should perhaps do something like
>
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2934,7 +2934,10 @@ static int __io_uring_register(struct
> io_ring_ctx *ctx, unsigned opcode,
> {
> int ret;
>
> + if (!percpu_ref_tryget(&ctx->refs))
> + return -EBUSY;
> percpu_ref_kill(&ctx->refs);
> + percpu_ref_put(&ctx->refs);
>
> /*
> * Drop uring mutex before waiting for references to exit. If another
>
> to guarantee that it's the *only* case of io_uring_register() doing that kill.
>
> Hmm?

Just sent out something as well. I think we can get by with just
checking if it's dying, or we can go the route of what you did which is
actually very similar to what the earlier versions did. Both versions
should fix the issue.

I'll test just to be totally sure.

--
Jens Axboe