Re: [syzbot] [kernel?] WARNING in signal_wake_up_state

From: Mike Christie
Date: Thu Jan 11 2024 - 12:20:42 EST

Next message: Krzysztof Kozlowski: "Re: [PATCH 1/3] dt-bindings: net: dp83826: add ti,cfg-dac-minus binding"
Previous message: Andri Yngvason: "Re: [PATCH 2/7] drm/uAPI: Add "active color format" drm property as feedback for userspace"
In reply to: Eric W. Biederman: "Re: [syzbot] [kernel?] WARNING in signal_wake_up_state"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 1/10/24 10:11 AM, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
>> Oleg/Eric, can you make any sense of this?
>>
>> On Tue, 9 Jan 2024 at 10:18, syzbot
>> <syzbot+c6d438f2d77f96cae7c2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> The issue was bisected to:
>>>
>>> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>>
>> Hmm. This smells more like a "that triggers the problem" than a cause.
>>
>> Because the warning itself is
>>
>>> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>>
>> That's
>>
>> lockdep_assert_held(&t->sighand->siglock);
>>
>> at the top of the function, with the call trace being
>>
>>> signal_wake_up include/linux/sched/signal.h:448 [inline]
>>
>> just a wrapper setting 'state'.
>>
>>> zap_process fs/coredump.c:373 [inline]
>>
>> That's zap_process() that does a
>>
>> for_each_thread(start, t) {
>>
>> and then does a
>>
>> signal_wake_up(t, 1);
>>
>> on each thread.
>>
>>> zap_threads fs/coredump.c:392 [inline]
>>
>> And this is zap_threads(), which does
>>
>> spin_lock_irq(&tsk->sighand->siglock);
>> ...
>> nr = zap_process(tsk, exit_code);
>>
>> Strange. The sighand->siglock is definitely taken.
>>
>> The for_each_thread() must be hitting a thread with a different
>> sighand, but it's basically a
>>
>> list_for_each_entry_rcu(..)
>>
>> walking over the tsk->signal->thread_head list.
>>
>> But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
>> we always require that CLONE_SIGHAND is also set:
>>
>> if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
>> return ERR_PTR(-EINVAL);
>>
>> so we most definitely should have the same ->sighand if we have the
>> same ->signal. And that's true very much for that vhost_task_create()
>> case too.
>>
>> So as far as I can see, that bisected commit does add a new case of
>> threaded signal handling, but in no way explains the problem.
>>
>> Is there some odd exit race? The thread is removed with
>>
>> list_del_rcu(&p->thread_node);
>>
>> in __exit_signal -> __unhash_process(), and despite the RCU
>> annotations, all these parts seem to hold the right locks too (ie
>> sighand->siglock is held by __exit_signal too), so I don't even see
>> any delayed de-allocation issue or anything like that.
>>
>> Thus bringing in Eric/Oleg to see if they see something I miss.
>
> I expect this would take going through the rest of the reproducer
> to see what is going on.
> Hmm. The reproducer is in something other than C:
>
>> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292
>> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
>> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false}
>> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0)
>> ioctl$int_in(r0, 0x40000000af01, 0x0)
>> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0)
>> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8)
>> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000)
>
> If I read that correctly it is intending to put an elf32 executable into
> a memfd and then execute it.
>
> Exec will possibly unshare SIGHAND struct if there is still a reference
> to it from somewhere else to ensure the new process has a clean one.
>
> But all of the other threads should get shutdown by de_thread before any
> of that happens. And de_thread should take care of the weird non-leader
> execve case as well. So after that point the process really should
> be single threaded. Which is why de_thread is the point of no return.
>
> That whole interrupt comes in, and a fatal signal is processed
> scenario is confusing.
>
> Hmm. That weird vnet ioctl at the beginning must be what is starting
> the vhost logic. So I guess it makes sense if the signal is received
> by the magic vhost thread.
>
> Perhaps there is some weird vhost logic where the thread lingers.
>
> Ugh. I seem to remember a whole conversation about the vhost logic
> (immediately after it was merged) and how it had a bug where it exited
> differently from everything else. I remember people figuring it was

The vhost code just wanted to wait for running IO from the vhost thread
context before exiting instead of working like io_uring which would wait
from a secondary thread.

> immediately ok, after the code was merged, and because everything had to
> be performed as root, and no one actually uses the vhost logic like
> that. It has been long enough I thought that would have been sorted
> out by now.
>
> Looking back to refresh my memory at the original conversation:
> https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@oraclecom/
>
> The bisect is 100% correct, and this was a known issue with that code at
> the time it was merged.
>

I have patches for this that I've been testing. Will post to the virt list
for the next feature window.

Next message: Krzysztof Kozlowski: "Re: [PATCH 1/3] dt-bindings: net: dp83826: add ti,cfg-dac-minus binding"
Previous message: Andri Yngvason: "Re: [PATCH 2/7] drm/uAPI: Add "active color format" drm property as feedback for userspace"
In reply to: Eric W. Biederman: "Re: [syzbot] [kernel?] WARNING in signal_wake_up_state"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]