Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

From: Pengfei Xu
Date: Wed Apr 12 2023 - 22:47:35 EST


Hi Huafei and kernel experts,

It's a soft remind.
This issue could be reproduced in v6.3-rc6 kernel.
It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.

New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
are in link: https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu

Thanks!
BR.

On 2023-03-15 at 20:20:44 +0800, Pengfei Xu wrote:
> Hi Li Huafei and kernel experts,
>
> Greeting!
>
> Platform: x86 platforms
> There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:
>
> All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
> Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
> v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
> "
> [ 24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
> [ 31.392668] hrtimer: interrupt took 10726570 ns
> [ 62.552360] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2942 jiffies old.
> [ 92.760386] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10494 jiffies old.
> [ 182.808389] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 33006 jiffies old.
> [ 300.120396] INFO: task repro:540 blocked for more than 147 seconds.
> [ 300.120937] Not tainted 6.3.0-rc2-eeac8ede1755+ #1
> [ 300.121351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 300.121888] task:repro state:D stack:0 pid:540 ppid:538 flags:0x00004006
> [ 300.122552] Call Trace:
> [ 300.122835] <TASK>
> [ 300.123267] __schedule+0x40a/0xc90
> [ 300.124351] ? wait_for_completion+0x7b/0x180
> [ 300.124772] schedule+0x5b/0xe0
> [ 300.125156] schedule_timeout+0x50b/0x670
> [ 300.125829] ? schedule_timeout+0x9/0x670
> [ 300.126414] ? wait_for_completion+0x7b/0x180
> [ 300.126826] wait_for_completion+0xa6/0x180
> [ 300.127478] __wait_rcu_gp+0x136/0x160
> [ 300.127922] ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
> [ 300.128639] synchronize_rcu_tasks_generic.part.24+0x3b/0x60
> [ 300.129092] ? __pfx_call_rcu_tasks+0x10/0x10
> [ 300.129499] ? __pfx_wakeme_after_rcu+0x10/0x10
> [ 300.130115] ? verify_cpu+0x10/0x100
> [ 300.130574] synchronize_rcu_tasks_generic+0x24/0x70
> [ 300.131020] synchronize_rcu_tasks+0x19/0x20
> [ 300.131438] ftrace_shutdown+0x1cc/0x410
> [ 300.132206] unregister_ftrace_function+0x35/0x230
> [ 300.132722] ? __sanitizer_cov_trace_switch+0x57/0xa0
> [ 300.133311] perf_ftrace_event_register+0x95/0xf0
> [ 300.133750] ? __pfx_tp_perf_event_destroy+0x10/0x10
> [ 300.134190] perf_trace_destroy+0x3a/0xa0
> [ 300.134590] ? __pfx_tp_perf_event_destroy+0x10/0x10
> [ 300.135028] tp_perf_event_destroy+0x1e/0x30
> [ 300.135473] _free_event+0x101/0x810
> [ 300.136191] put_event+0x3c/0x50
> [ 300.136605] perf_event_release_kernel+0x2de/0x360
> [ 300.137011] ? perf_event_release_kernel+0x9/0x360
> [ 300.137577] ? __pfx_perf_release+0x10/0x10
> [ 300.138031] perf_release+0x22/0x30
> [ 300.138434] __fput+0x11f/0x450
> [ 300.139038] ____fput+0x1e/0x30
> [ 300.139425] task_work_run+0xb6/0x120
> [ 300.140164] do_exit+0x547/0x1360
> [ 300.140632] ? write_comp_data+0x2f/0x90
> [ 300.141223] do_group_exit+0x5e/0xf0
> [ 300.141732] get_signal+0x15d1/0x1600
> [ 300.142596] arch_do_signal_or_restart+0x33/0x280
> [ 300.143463] exit_to_user_mode_prepare+0x13b/0x210
> [ 300.143955] syscall_exit_to_user_mode+0x2d/0x60
> [ 300.144623] do_syscall_64+0x4a/0x90
> [ 300.145032] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 300.145459] RIP: 0033:0x7f2a7abd859d
> [ 300.145823] RSP: 002b:00007ffcc732b338 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
> [ 300.146429] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2a7abd859d
> [ 300.146930] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200004c0
> [ 300.147436] RBP: 00007ffcc732b350 R08: 0000000000000000 R09: 00007ffcc732b350
> [ 300.147938] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011e0
> [ 300.148645] R13: 00007ffcc732b470 R14: 0000000000000000 R15: 0000000000000000
> [ 300.149713] </TASK>
> [ 300.149978]
> [ 300.149978] Showing all locks held in the system:
> [ 300.150435] 1 lock held by rcu_tasks_kthre/11:
> [ 300.150808] #0: ffffffff83f63490 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [ 300.151849] 1 lock held by rcu_tasks_rude_/12:
> [ 300.152593] #0: ffffffff83f63210 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [ 300.153645] 1 lock held by rcu_tasks_trace/13:
> [ 300.154013] #0: ffffffff83f62f50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [ 300.155078] 1 lock held by khungtaskd/29:
> [ 300.155425] #0: ffffffff83f63ea0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
> [ 300.156644] 2 locks held by repro/540:
> [ 300.156977] #0: ffffffff840206a8 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
> [ 300.157971] #1: ffffffff8401cd68 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230
> [ 300.159005]
> [ 300.159233] =============================================
> [ 300.159233]
> "
> Bisected and found first bad commit:
> "
> 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294
> ftrace: Fix use-after-free for dynamic ftrace_ops
> "
> After reverted the above commit on top of v6.3-rc2 kernel, this issue was gone.
>
> And this issue could be reproduced in 2100s.
>
> There was no this issue report in syzbot link:
> https://syzkaller.appspot.com/upstream
>
> I hope it's helpful.
>
> Thanks!
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> // You could change the bzImage_xxx as you want
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
> make
> make install
>
> Thanks!
> BR.