Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel

From: Jens Axboe
Date: Thu Apr 13 2023 - 08:40:25 EST

Next message: David Hildenbrand: "Re: find_get_page() VS pin_user_pages()"
Previous message: Kal Cutter Conley: "Re: [PATCH bpf-next v3 1/3] xsk: Support UMEM chunk_size > PAGE_SIZE"
In reply to: Pengfei Xu: "Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel"
Next in thread: Frederic Weisbecker: "Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 4/13/23 5:19 AM, Frederic Weisbecker wrote:
> On Thu, Apr 13, 2023 at 10:45:35AM +0800, Pengfei Xu wrote:
>> Hi Huafei and kernel experts,
>>
>> It's a soft remind.
>> This issue could be reproduced in v6.3-rc6 kernel.
>> It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
>> After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
>> dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
>>
>> New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
>> are in link:
>> https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu
>
> I just tested against v6.3-rc6 and again all I get is this io_ring related
> issue:
>
> [ 448.290752] INFO: task kworker/u4:0:9 blocked for more than 294 seconds.
> [ 448.293868] Not tainted 6.3.0-rc6-kvm #1
> [ 448.296019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 448.299592] task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
> [ 448.303397] Workqueue: events_unbound io_ring_exit_work
> [ 448.305884] Call Trace:
> [ 448.307147] <TASK>
> [ 448.308166] __schedule+0x422/0xc90
> [ 448.309824] ? wait_for_completion+0x77/0x170
> [ 448.311870] schedule+0x63/0xd0
> [ 448.313346] schedule_timeout+0x2fe/0x4c0
> [ 448.315255] ? __this_cpu_preempt_check+0x1c/0x30
> [ 448.317360] ? _raw_spin_unlock_irq+0x27/0x60
> [ 448.319400] ? lockdep_hardirqs_on+0x88/0x120
> [ 448.321395] ? wait_for_completion+0x77/0x170
> [ 448.323462] wait_for_completion+0x9e/0x170
> [ 448.325356] io_ring_exit_work+0x2b0/0x810
> [ 448.327300] ? __pfx_io_tctx_exit_cb+0x10/0x10
> [ 448.329345] ? _raw_spin_unlock_irq+0x27/0x60
> [ 448.331397] process_one_work+0x34e/0x720
> [ 448.333212] ? __pfx_io_ring_exit_work+0x10/0x10
> [ 448.335377] ? process_one_work+0x34e/0x720
> [ 448.337295] worker_thread+0x4e/0x530
> [ 448.339079] ? __pfx_worker_thread+0x10/0x10
> [ 448.341008] kthread+0x128/0x160
> [ 448.342513] ? __pfx_kthread+0x10/0x10
> [ 448.344305] ret_from_fork+0x2c/0x50
> [ 448.346016] </TASK>
> [ 448.347176]
> [ 448.347176] Showing all locks held in the system:
> [ 448.349887] 2 locks held by kworker/u4:0/9:
> [ 448.351829] #0: ffff88807eb6dd38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
> [ 448.356362] #1: ffffc9000005fe68 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x2b4/0x720
> [ 448.361052] 1 lock held by rcu_tasks_kthre/11:
> [ 448.363142] #0: ffffffff83963450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.367396] 1 lock held by rcu_tasks_rude_/12:
> [ 448.369387] #0: ffffffff839631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.373808] 1 lock held by rcu_tasks_trace/13:
> [ 448.375852] #0: ffffffff83962f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x440
> [ 448.380434] 1 lock held by khungtaskd/30:
> [ 448.382219] #0: ffffffff83963e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x17/0x1d0
> [ 448.386291]
> [ 448.387159] =============================================
> [ 448.387159]

Is there a reproducer for this one so we can take a look? It's not
impossible to get into this state if you muck with signals, eg ring
exit work is queued but needs requests to complete, and the latter
is prevented by deliberately sending a SIGSTOP to the task that needs
to complete them.

--
Jens Axboe

Next message: David Hildenbrand: "Re: find_get_page() VS pin_user_pages()"
Previous message: Kal Cutter Conley: "Re: [PATCH bpf-next v3 1/3] xsk: Support UMEM chunk_size > PAGE_SIZE"
In reply to: Pengfei Xu: "Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel"
Next in thread: Frederic Weisbecker: "Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]