Re: WARNING in task_participate_group_stop

From: Dmitry Vyukov
Date: Mon Nov 06 2017 - 06:02:47 EST


On Thu, Nov 2, 2017 at 6:01 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 11/01, Dmitry Vyukov wrote:
>>
>> On Tue, Oct 31, 2017 at 7:34 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>> > Hmm. I do not see reproducer in this email...
>>
>> Ah, sorry. You can see full thread with attachments here:
>> https://groups.google.com/forum/#!topic/syzkaller-bugs/EUmYZU4m5gU
>
> Heh. I can't say I enjoyed reading the reproducer ;)
>
>> >> > WARNING: CPU: 0 PID: 1 at kernel/signal.c:340
>> >> > task_participate_group_stop+0x1ce/0x230 kernel/signal.c:340
>> >> > Kernel panic - not syncing: panic_on_warn set ...
>> >> >
>> >> > CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-mm1+ #5
>> >
>> > So this is init process with SIGNAL_UNKILLABLE flag set. And I hope it has
>> > the pending SIGKILL, otherwise there is something else.
>
> From repro.c
>
> line 111 r[8] = syscall(__NR_ptrace, 0x10ul, r[7]);
>
> this is PTRACE_ATTACH
>
> line 115 syscall(__NR_ptrace, 0x4200ul, r[7], 0x40000012ul, 0x100012ul);
>
> this is PTRACE_SETOPTIONS and "data" includes PTRACE_O_EXITKILL.
>
> r[7] is initialized at
>
> line 110 r[7] = *(uint32_t*)0x20f9cffc;
>
> so if it is eq to 1 then it can attach to init and in this case the problem
> can be explained by the wrong SIGNAL_UNKILLABLE/SIGKILL logic.
>
> But how *(uint32_t*)0x20f9cffc can be 1 ?
>
> line 108 r[6] = syscall(__NR_fcntl, r[1], 0x10ul, 0x20f9cff8ul);
>
> this is F_GETOWN_EX, addr = 0x20f9cff8 == 0x20f9cffc + 4, so if fcntl()
> actually succeeds then r[7] == f_owner_ex->pid.
>
> It _can_ be 1, but the reproducer doesn't work for me. If you can reproduce,
> could you try the patch below?

Hi,

I would like to understand why you were not able to reproduce it. I
won't be sitting here all the time, and we are tracking hundreds of
bugs across different linux kernels and other OSes, so it's
problematic to do any extensive work on all of them. That's why we try
to provide reproducers.

I've just tried the repro on the latest upstream
(39dae59d66acd86d1de24294bd2f343fd5e7a625) and it triggered the
WARNING within a second.
Did you use the config provided? Did you use qemu or real hardware?
Can you try in qemu (with -smp>1)?





> diff --git a/kernel/signal.c b/kernel/signal.c
> index 800a18f..7e15b56 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -78,7 +78,7 @@ static int sig_task_ignored(struct task_struct *t, int sig, bool force)
> handler = sig_handler(t, sig);
>
> if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
> - handler == SIG_DFL && !force)
> + handler == SIG_DFL && !(force && sig_kernel_only(sig)))
> return 1;
>
> return sig_handler_ignored(handler, sig);
> @@ -94,13 +94,15 @@ static int sig_ignored(struct task_struct *t, int sig, bool force)
> if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
> return 0;
>
> - if (!sig_task_ignored(t, sig, force))
> - return 0;
> -
> /*
> - * Tracers may want to know about even ignored signals.
> + * Tracers may want to know about even ignored signal unless it
> + * is SIGKILL which can't be reported anyway but can be ignored
> + * by SIGNAL_UNKILLABLE task.
> */
> - return !t->ptrace;
> + if (t->ptrace && sig != SIGKILL)
> + return 0;
> +
> + return sig_task_ignored(t, sig, force);
> }
>
> /*
> @@ -929,9 +931,9 @@ static void complete_signal(int sig, struct task_struct *p, int group)
> * then start taking the whole group down immediately.
> */
> if (sig_fatal(p, sig) &&
> - !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
> + !(signal->flags & SIGNAL_GROUP_EXIT) &&
> !sigismember(&t->real_blocked, sig) &&
> - (sig == SIGKILL || !t->ptrace)) {
> + (sig == SIGKILL || !p->ptrace)) {
> /*
> * This signal will be fatal to the whole group.
> */
>