Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

From: Linus Torvalds
Date: Sat Apr 11 2020 - 17:57:46 EST


On Sat, Apr 11, 2020 at 2:21 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> You're confused.
>
> There *was* that callback already. It happened when the thread exited and became a zombie. No ordering has changed.

Also, stop calling this a "deadlock" like you do, when you're talking
about the kernel behavior.

That only - again - shows that you're confused.

The kernel at no point was deadlocked.

The kernel had two events - the exit of one thread, and the execve()
of another - that it wanted to return to the tracer.

The tracer isn't listening to the first one, and the kernel is waiting
for the tracer to do so. The tracer wants to get the second one, but
it's not happening until the tracer has dealt with the first one.

That's a deadlock FOR THE TRACER.

But it's not a deadlock for the kernel. See the difference?

It literally is absolutely not at all different from a user space
application that deadlocks on a full pipe buffer, because it's not
emptying it - and it's not emptying it because it's trying to just
write more data to it.

See? It's a user space bug. It really is that simple.

Here, I'll give you this program just so that you can run it and see
for yourself how it hangs:

#include <unistd.h>

char buffer[1024*1024];

int main(void)
{
int fd[2];
pipe(fd);
write(fd[1], buffer, sizeof(buffer));
read(fd[0], buffer, sizeof(buffer));
}

and it's *exactly* the same thing. The kernel buffer for the pipe
fills up, so the write() ends up hanging. There's a read() there to
empty the buffer, but that program will never get to it, because the
write hangs.

Is the above a kernel deadlock? No.

And anybody who calls it a "deadlock" when talking to kernel people is
confused and should be ignored.

So stop calling the ptrace() issue a deadlock when talking to kernel
people - it just means annoys and confuses them. I literally thought
something completely different was going on initially because you and
Eric called it a deadlock: I thought that one thread was tracing
another, and that the SIGKILL didn't resolve it.

But no. It's not a kernel deadlock at all, it's something much
"simpler". It is the same exact thing as the stupid buggy example
program above, except instead of using "write()" and waiting for
something that will never happen because the write needs another thing
to happen first, the buggy ptrace test program is using 'ptrace()' to
wait for something that will never happen.

Now, admittedly there is _one_ big difference: ptrace() is a garbage
interface. Nobody disputes that.

ptrace() is absolutely horrible for so many reasons. One reason it's
horrible is that it doesn't have a nonblocking mode with poll(), like
the write() above would have, and that people who use read and write
can do. You can fix the stupid deadlock with the above read/write loop
by switching over to nonblocking IO and a poll loop - doing the same
with ptrace is more difficult, because ptrace() just isn't that kind
of interface.

And with read/write, you could also (and this is even more common)
just used multiple processes (or threads), so that one process does
reading, and another does writing. Again, ptrace() is a shitty
interface and doesn't really allow for that very well, although maybe
it could be made to work that way too.

But even with ptrace, you could have made the tracing program set a
signal handler for SIGCHLD *before* it started doing forks and stuff,
and then exit of the first thread should have caused a signal, and you
could have reaped it there, and it should work with ptrace.

But that garbage ptrace() test program didn't do even that. So it ends
up hanging, waiting for an event that never happens, because it didn't
do that other thing it needed to do - *EXACTLY* like the "write()"
ends up hanging, waiting for that read() that will never happen,
because the garbage test-program did things wrong.

So as long as you guys keep talking about "deadlocks", the _only_
thing you are showing is that you don't understand the problem.

It's not a deadlock - it's a buggy test.

Now, what I've suggested are a couple of ways to make ptrace()
friendlier to use, and thus allow that stupid test to work the way you
seem to want it to work.

Basically, execve() doesn't have a lot of reasons to really wait for
the threads it waits for, and the only real thing it needs to do is to
make sure they are killed. But maybe it can just ignore them once they
are dead.

Or alternatively, it can just say "if you didn't reap the zombie
threads, I'll reap them for you".

Or, as mentioned, we can do nothing, and just let buggy user space
tracing programs hang. It's not the _kernels_ problem if you write
buggy code in user space, this is not anything we've ever helped with
before, so it's not like it's a regression.

(Although it can be a regression for your buggy program if you used to
- for example - do threading entirely using some user-space library,
and didn't have threads that the kernel knew about, and so when your
program did 'execve()' it just got rid of those threads
automatically).

Linus