RE: [syzbot] WARNING in exit_tasks_rcu_finish

From: Zhang, Qiang1
Date: Mon Jun 13 2022 - 14:07:38 EST


> syzbot <syzbot+9bb26e7c5e8e4fa7e641@xxxxxxxxxxxxxxxxxxxxxxxxx> writes:
>
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 6d0c80680317 Add linux-next specific files for 20220610
> > git tree: linux-next
> > console output:
> > https://syzkaller.appspot.com/x/log.txt?x=13b52c2ff00000
> > kernel config:
> > https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
>
> I don't understand what is going on in linux-next kernel/rcu/tasks.h
> looks different than in Linus's tree. Paul does that mean you have
> some staged rcu changes?

>Less than 100 RCU-related patches in -rcu, so not all that bad. ;-)
>
>But yes, this could possibly be an issue in one of those patches.

> Eric
>
>
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+9bb26e7c5e8e4fa7e641@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664
> > exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
> > WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664
> > exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006

>The usual way for this warning to trigger is for these a task to exit while in an RCU Tasks Trace read-side critical section:
>
> rcu_read_lock_trace();
> do_something_that_causes_task_exit();
>

Hi Paul, wether the following scenarios be considered

rcu_read_unlock_trace_special
->if (trs.b.blocked)
->raw_spin_lock_irqsave_rcu_node
->list_del_init(&t->trc_blkd_node)
->WRITE_ONCE(t->trc_reader_special.b.blocked, false)
->raw_spin_unlock_irqrestore_rcu_node
->Inerrrupt
->schedule
->rcu_note_context_switch
->rcu_tasks_trace_qs
If (___rttq_nesting && !READ_ONCE((t)->trc_reader_special.b.blocked)
/*___rttq_nesting ==1 && (t)->trc_reader_special.b.blocked =false*/
rcu_tasks_trace_qs_blkd(t)
->WRITE_ONCE(t->trc_reader_nesting, 0)
.......
-> exit_tasks_rcu_finish

Whether the following patch can fix it, or what am I missing?
Any thoughts?

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index f1209ce621c5..c607e4c914d3 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1247,6 +1247,7 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
struct rcu_tasks_percpu *rtpcp;
union rcu_special trs;

+ WRITE_ONCE(t->trc_reader_nesting, 0);
// Open-coded full-word version of rcu_ld_need_qs().
smp_mb(); // Enforce full grace-period ordering.
trs = smp_load_acquire(&t->trc_reader_special);
@@ -1267,7 +1268,6 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
WRITE_ONCE(t->trc_reader_special.b.blocked, false);
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
}
- WRITE_ONCE(t->trc_reader_nesting, 0);
}
EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);



Thanks
Zqiang

> > Modules linked in:
> > CPU: 1 PID: 28639 Comm: syz-executor.3 Not tainted
> > 5.19.0-rc1-next-20220610-syzkaller #0 Hardware name: Google Google
> > Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664
> > [inline]
> > RIP: 0010:exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
> > Code: fc ff df 48 89 fa 48 c1 ea 03 0f b6 04 02 48 89 fa 83 e2 07 38
> > d0 7f 04 84 c0 75 62 0f b6 85 89 04 00 00 83 e0 01 08 d8 74 0e <0f>
> > 0b 5b 48 89 ef 5d 41 5c e9 00 d2 ff ff 48 b8 00 00 00 00 00 fc
> > RSP: 0018:ffffc9000a497b58 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000201 RCX: 1ffff11004213b89
> > RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88802109dc49
> > RBP: ffff88802109d7c0 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000001 R12: ffff88802109dc40
> > R13: dffffc0000000000 R14: ffff88802109d7c0 R15: ffff88802109ddd0
> > FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f771379d090 CR3: 0000000051fd2000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > do_exit+0x18f2/0x2a00 kernel/exit.c:844
> > do_group_exit+0xd2/0x2f0 kernel/exit.c:925
> > get_signal+0x2542/0x2600 kernel/signal.c:2857
>
>And in this case, the do_something_that_causes_task_exit() might be reception of a fatal signal...
>
> > arch_do_signal_or_restart+0x82/0x2300 arch/x86/kernel/signal.c:869
> > exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
> > exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:201
> > __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
> > syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294

>...but it is also impolite to exit to usermode while in an RCU Tasks read-side critical section.
>
>It looks like this kernel was built with CONFIG_PROVE_LOCKING=y, which should have complained if someone was exiting to userspace while in an RCU Tasks Trace read-side critical section.
>
>So, was there tracing of any kind in use during this test?
>
> Thanx, Paul
>
> > do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
> > entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > RIP: 0033:0x7f7713689109
> > Code: Unable to access opcode bytes at RIP 0x7f77136890df.
> > RSP: 002b:00007f77125fe218 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000ca
> > RAX: 0000000000000001 RBX: 00007f771379bf68 RCX: 00007f7713689109
> > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f771379bf6c
> > RBP: 00007f771379bf60 R08: 00007fff2278d080 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f771379bf6c
> > R13: 00007fff22724c8f R14: 00007f77125fe300 R15: 0000000000022000
> > </TASK>
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.