Re: [PATCH 1/2] wait/ptrace: always assume __WALL if the child is traced

From: Denys Vlasenko
Date: Wed Oct 21 2015 - 16:34:55 EST


On 10/21/2015 09:59 PM, Denys Vlasenko wrote:
> On 10/21/2015 12:31 AM, Andrew Morton wrote:
>> On Tue, 20 Oct 2015 19:17:54 +0200 Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>>
>>> The following program (simplified version of generated by syzkaller)
>>>
>>> #include <pthread.h>
>>> #include <unistd.h>
>>> #include <sys/ptrace.h>
>>> #include <stdio.h>
>>> #include <signal.h>
>>>
>>> void *thread_func(void *arg)
>>> {
>>> ptrace(PTRACE_TRACEME, 0,0,0);
>>> return 0;
>>> }
>>>
>>> int main(void)
>>> {
>>> pthread_t thread;
>>>
>>> if (fork())
>>> return 0;
>>>
>>> while (getppid() != 1)
>>> ;
>>>
>>> pthread_create(&thread, NULL, thread_func, NULL);
>>> pthread_join(thread, NULL);
>>> return 0;
>>> }
>>>
>>> creates the unreapable zombie if /sbin/init doesn't use __WALL.
>>>
>>> This is not a kernel bug, at least in a sense that everything works as
>>> expected: debugger should reap a traced sub-thread before it can reap
>>> the leader, but without __WALL/__WCLONE do_wait() ignores sub-threads.
>>>
>>> Unfortunately, it seems that /sbin/init in most (all?) distributions
>>> doesn't use it and we have to change the kernel to avoid the problem.
>>
>> Well, to fix this a distro needs to roll out a new kernel. Or a new
>> init(8). Is there any reason to believe that distributing/deploying a
>> new kernel is significantly easier for everyone? Because fixing init
>> sounds like a much preferable solution to this problem.
>
> People will continue to write new init(8) implementations,
> and they will miss this obscure case.
>
> Before this bug was found, it was considered possible to use
> a shell script as init process. What now, every shell needs to add
> __WALL to its waitpids?
>
> The use of PTRACE_TRACEME in this reproducer is clearly pathological:
> PTRACE_TRACEME was never intended to be used to attach to unsuspecting
> processes.
>
> How about making PTRACE_TRACEME fail in this case?

something like this


diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 787320d..285a58c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -385,6 +385,17 @@ static int ptrace_traceme(void)
write_lock_irq(&tasklist_lock);
/* Are we already being traced? */
if (!current->ptrace) {
+ struct pid_namespace *pid_ns;
+
+ pid_ns = task_active_pid_ns(current->parent);
+ if (current->parent == pid_ns->child_reaper) {
+ /*
+ * Our parent is init. We may be a reparented process
+ * used for PTRACE_TRACEME zombie attack on init.
+ */
+ goto nope;
+ }
+
ret = security_ptrace_traceme(current->parent);
/*
* Check PF_EXITING to ensure ->real_parent has not passed
@@ -395,6 +406,7 @@ static int ptrace_traceme(void)
current->ptrace = PT_PTRACED;
__ptrace_link(current, current->real_parent);
}
+ nope: ;
}
write_unlock_irq(&tasklist_lock);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/