Re: [syzbot] [kernel?] WARNING in signal_wake_up_state

From: Eric W. Biederman
Date: Wed Jan 10 2024 - 11:31:04 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> Oleg/Eric, can you make any sense of this?
>
> On Tue, 9 Jan 2024 at 10:18, syzbot
> <syzbot+c6d438f2d77f96cae7c2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> The issue was bisected to:
>>
>> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>
> Hmm. This smells more like a "that triggers the problem" than a cause.
>
> Because the warning itself is
>
>> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>
> That's
>
> lockdep_assert_held(&t->sighand->siglock);
>
> at the top of the function, with the call trace being
>
>> signal_wake_up include/linux/sched/signal.h:448 [inline]
>
> just a wrapper setting 'state'.
>
>> zap_process fs/coredump.c:373 [inline]
>
> That's zap_process() that does a
>
> for_each_thread(start, t) {
>
> and then does a
>
> signal_wake_up(t, 1);
>
> on each thread.
>
>> zap_threads fs/coredump.c:392 [inline]
>
> And this is zap_threads(), which does
>
> spin_lock_irq(&tsk->sighand->siglock);
> ...
> nr = zap_process(tsk, exit_code);
>
> Strange. The sighand->siglock is definitely taken.
>
> The for_each_thread() must be hitting a thread with a different
> sighand, but it's basically a
>
> list_for_each_entry_rcu(..)
>
> walking over the tsk->signal->thread_head list.
>
> But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
> we always require that CLONE_SIGHAND is also set:
>
> if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
> return ERR_PTR(-EINVAL);
>
> so we most definitely should have the same ->sighand if we have the
> same ->signal. And that's true very much for that vhost_task_create()
> case too.
>
> So as far as I can see, that bisected commit does add a new case of
> threaded signal handling, but in no way explains the problem.
>
> Is there some odd exit race? The thread is removed with
>
> list_del_rcu(&p->thread_node);
>
> in __exit_signal -> __unhash_process(), and despite the RCU
> annotations, all these parts seem to hold the right locks too (ie
> sighand->siglock is held by __exit_signal too), so I don't even see
> any delayed de-allocation issue or anything like that.
>
> Thus bringing in Eric/Oleg to see if they see something I miss.

I expect this would take going through the rest of the reproducer
to see what is going on.
Hmm. The reproducer is in something other than C:

> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292
> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false}
> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0)
> ioctl$int_in(r0, 0x40000000af01, 0x0)
> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0)
> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8)
> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000)

If I read that correctly it is intending to put an elf32 executable into
a memfd and then execute it.

Exec will possibly unshare SIGHAND struct if there is still a reference
to it from somewhere else to ensure the new process has a clean one.

But all of the other threads should get shutdown by de_thread before any
of that happens. And de_thread should take care of the weird non-leader
execve case as well. So after that point the process really should
be single threaded. Which is why de_thread is the point of no return.

That whole interrupt comes in, and a fatal signal is processed
scenario is confusing.

Hmm. That weird vnet ioctl at the beginning must be what is starting
the vhost logic. So I guess it makes sense if the signal is received
by the magic vhost thread.

Perhaps there is some weird vhost logic where the thread lingers.

Ugh. I seem to remember a whole conversation about the vhost logic
(immediately after it was merged) and how it had a bug where it exited
differently from everything else. I remember people figuring it was
immediately ok, after the code was merged, and because everything had to
be performed as root, and no one actually uses the vhost logic like
that. It has been long enough I thought that would have been sorted
out by now.

Looking back to refresh my memory at the original conversation:
https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@xxxxxxxxxx/

The bisect is 100% correct, and this was a known issue with that code at
the time it was merged.

I will let someone else take it from here.

Eric