Re: WARNING in percpu_ref_kill_and_confirm (2)

From: Pavel Begunkov
Date: Fri Dec 18 2020 - 11:39:16 EST


On 17/12/2020 03:17, Hillf Danton wrote:
> Wed, 16 Dec 2020 13:14:11 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit: 7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640
>> compiler: gcc (GCC) 10.1.0-syz 20200507
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000
>>
>> The issue was bisected to:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Date: Fri Oct 2 09:04:21 2020 +0000
>>
>> lockdep: Fix lockdep recursion
>>
>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000
>> final oops: https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+c9937dfb2303a5f18640@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
>>
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309
>> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
>> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
>> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
>> ------------[ cut here ]------------
>> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
>> WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
>> Modules linked in:
>> CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
>> Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02
>> RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086
>> RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000
>> RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4
>> RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293
>> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758
>> FS: 00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0
>> Call Trace:
>> percpu_ref_kill include/linux/percpu-refcount.h:149 [inline]
>> io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382
>> io_uring_release+0x3e/0x50 fs/io_uring.c:8420
>> __fput+0x285/0x920 fs/file_table.c:281
>> task_work_run+0xdd/0x190 kernel/task_work.c:151
>> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>> exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
>> exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
>> syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x441309
>> Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
>> RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309
>> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
>> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
>> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
>
> Avoid double kill by checking ctx health.

Let's focus on _how_ it can happen. Refs may be killed by
__io_uring_register(), but this one holds a ref to the file, so
io_uring_release() -> io_ring_ctx_wait_and_kill() shouldn't even happen.
And when io_ring_ctx_wait_and_kill() is called fdget() for that ring
wouldn't be possible. That's if no other bugs are present.

We want to solve a problem rather than mask it. So, can it really
happen or a problem is somewhere else?

>
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8379,7 +8379,13 @@ static void io_ring_exit_work(struct wor
> static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
> {
> mutex_lock(&ctx->uring_lock);
> - percpu_ref_kill(&ctx->refs);
> + /*
> + * try to avoid killing dead ctx, see the comments for dropping
> + * ring mutex in __io_uring_register()
> + */
> + if (!percpu_ref_is_dying(&ctx->refs))
> + percpu_ref_kill(&ctx->refs);
> +
> mutex_unlock(&ctx->uring_lock);
>
> io_kill_timeouts(ctx, NULL);
>

--
Pavel Begunkov