Re: INFO: task hung in grab_super

From: Tetsuo Handa
Date: Wed Jul 18 2018 - 09:36:32 EST


On 2018/07/18 22:04, Dmitry Vyukov wrote:
> On Wed, Jul 18, 2018 at 2:53 PM, Tetsuo Handa
> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>> On 2018/07/18 20:41, Dmitry Vyukov wrote:
>>> This seems to be related to 9p. After rerunning the log I got:
>>>
>>> root@syzkaller:~# ps afxu | grep syz
>>> root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_
>>> [syz-executor] <defunct>
>>> root@syzkaller:~# cat /proc/18253/task/*/stack
>>> [<0>] p9_client_rpc+0x3a2/0x1400
>>> [<0>] p9_client_flush+0x134/0x2a0
>>> [<0>] p9_client_rpc+0x122c/0x1400
>>> [<0>] p9_client_create+0xc56/0x16af
>>> [<0>] v9fs_session_init+0x21a/0x1a80
>>> [<0>] v9fs_mount+0x7c/0x900
>>> [<0>] mount_fs+0xae/0x328
>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>>> [<0>] do_mount+0x581/0x30e0
>>> [<0>] ksys_mount+0x12d/0x140
>>> [<0>] __x64_sys_mount+0xbe/0x150
>>> [<0>] do_syscall_64+0x1b9/0x820
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [<0>] 0xffffffffffffffff
>>>
>>> There is a bunch of hangs in 9p, so let's do:
>>>
>>> #syz dup: INFO: task hung in flush_work
>>>
>> Then, is dumping all threads when khungtaskd fires a candidate
>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?
>
> Perhaps would be useful. But maybe only tasks that are blocked for
> more than timeout/2? and/or unkillable tasks? killable tasks are not a
> problem.

TASK_KILLABLE waiters are not reported by khungtaskd, are they?

/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);

And TASK_KILLABLE waiters can become a problem because

>
> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
> killed along with the whole process?
>

wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
it ignores SIGKILL by retrying the loop...

again:
err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
sigpending = 1;
clear_thread_flag(TIF_SIGPENDING);
goto again;
}

I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).