Re: [syzbot] [kernel?] WARNING in signal_wake_up_state

From: Linus Torvalds
Date: Tue Jan 09 2024 - 14:06:12 EST


Oleg/Eric, can you make any sense of this?

On Tue, 9 Jan 2024 at 10:18, syzbot
<syzbot+c6d438f2d77f96cae7c2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> The issue was bisected to:
>
> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080

Hmm. This smells more like a "that triggers the problem" than a cause.

Because the warning itself is

> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771

That's

lockdep_assert_held(&t->sighand->siglock);

at the top of the function, with the call trace being

> signal_wake_up include/linux/sched/signal.h:448 [inline]

just a wrapper setting 'state'.

> zap_process fs/coredump.c:373 [inline]

That's zap_process() that does a

for_each_thread(start, t) {

and then does a

signal_wake_up(t, 1);

on each thread.

> zap_threads fs/coredump.c:392 [inline]

And this is zap_threads(), which does

spin_lock_irq(&tsk->sighand->siglock);
...
nr = zap_process(tsk, exit_code);

Strange. The sighand->siglock is definitely taken.

The for_each_thread() must be hitting a thread with a different
sighand, but it's basically a

list_for_each_entry_rcu(..)

walking over the tsk->signal->thread_head list.

But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
we always require that CLONE_SIGHAND is also set:

if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
return ERR_PTR(-EINVAL);

so we most definitely should have the same ->sighand if we have the
same ->signal. And that's true very much for that vhost_task_create()
case too.

So as far as I can see, that bisected commit does add a new case of
threaded signal handling, but in no way explains the problem.

Is there some odd exit race? The thread is removed with

list_del_rcu(&p->thread_node);

in __exit_signal -> __unhash_process(), and despite the RCU
annotations, all these parts seem to hold the right locks too (ie
sighand->siglock is held by __exit_signal too), so I don't even see
any delayed de-allocation issue or anything like that.

Thus bringing in Eric/Oleg to see if they see something I miss.

Original email at

https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/

for your pleasure.

Linus