Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall

From: Eric W. Biederman
Date: Mon Apr 27 2020 - 14:22:07 EST


Jann Horn <jannh@xxxxxxxxxx> writes:

> On Mon, Apr 27, 2020 at 7:08 PM Christian Brauner
> <christian.brauner@xxxxxxxxxx> wrote:
>> On Sun, Apr 26, 2020 at 06:34:30PM +0200, Hagen Paul Pfeifer wrote:
>> > Working on a safety-critical stress testing tool, using ptrace in an
>> > rather uncommon way (stop, peeking memory, ...) for a bunch of
>> > applications in an automated way I realized that once opened processes
>> > where restarted and PIDs recycled. Resulting in monitoring and
>> > manipulating the wrong processes.
>> >
>> > With the advent of pidfd we are now able to stick with one stable handle
>> > to identifying processes exactly. We now have the ability to get this
>> > race free. Sending signals now works like a charm, next step is to
>> > extend the functionality also for ptrace.
>> >
>> > API:
>> > long pidfd_ptrace(int pidfd, enum __ptrace_request request,
>> > void *addr, void *data, unsigned flags);
>>
>> I'm in general not opposed to this if there's a clear need for this and
>> users that are interested. But I think if people really prefer having
>> this a new syscall then we should probably try to improve on the old
>> one. Things that come to mind right away without doing a deep review are
>> replacing the void *addr pointer with a dedicated struct ptract_args or
>> union ptrace_args and a size argument. If we're not doing something
>> like this or something more fundamental we can equally well either just
>> duplicate all enums in the old ptrace syscall and append a _PIDFD to it
>> where it makes sense.
>
> Yeah, it seems like just adding pidfd flavors of PTRACE_ATTACH and
> PTRACE_SEIZE should do the job.

I am conflicted about that but I have to agree. Instead of
duplicating everything it would be good enough to duplicate the once
that cause the process to be attached to use. Then there would be no
more pid races to worry about.

> And if we do make a new syscall, there is a bunch of de-crufting that
> can be done... for example, just as some low-hanging fruit, a new
> ptrace API probably shouldn't have
> PTRACE_PEEKTEXT/PTRACE_PEEKDATA/PTRACE_POKETEXT/PTRACE_POKEDATA (we
> have /proc/$pid/mem for that, which is much saner than doing peek/poke
> in word-size units), and probably also shouldn't support all the weird
> arch-specific register-accessing requests (e.g.
> PTRACE_PEEKUSR/PTRACE_POKEUSR/PTRACE_GETREGS/PTRACE_SETREGS/PTRACE_GETFPREGS/...)
> that are nowadays accessible via PTRACE_GETREGSET/PTRACE_SETREGSET.


> (And there are also some more major changes that I think would be
> sensible; for example, it'd be neat if you could have notifications
> about the tracee delivered through a pollable file descriptor, and if
> you could get the kernel to tell you in each notification which type
> of ptrace stop you're dealing with (e.g. distinguishing
> syscall-entry-stop vs syscall-exit-stop), and it would be great to be
> able to directly inject syscalls into the child instead of having to
> figure out where a syscall instruction you can abuse is and then
> setting the instruction pointer to that, and it'd be helpful to be
> able to have multiple tracers attached to a single process so that you
> can e.g. have strace and gdb active on the same process
> concurrently...)

How does this differ using the tracing related infrastructure we have
for the kernel on a userspace process? I suspect augmenting the tracing
infrastructure with the ability to set breakpoints and watchpoints (aka
stopping userspace threads and processes might be a more fertile
direction to go).

But I agree either we want to just address the races in PTRACE_ATTACH
and PTRACE_SIEZE or we want to take a good hard look at things.

There is a good case for minimal changes because one of the cases that
comes up is how much work will it take to change existing programs. But
ultimately ptrace pretty much sucks so a very good set of test cases and
documentation for what we want to implement would be a very good idea.

Eric