Re: [RFC PATCH] Implement /proc/pid/kill

From: Christian Brauner
Date: Tue Oct 30 2018 - 06:40:45 EST


On Tue, Oct 30, 2018 at 11:39:11AM +0100, Christian Brauner wrote:
> On Tue, Oct 30, 2018 at 08:50:22AM +0000, Daniel Colascione wrote:
> > On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes <joelaf@xxxxxxxxxx> wrote:
> > > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote:
> > >>
> > >> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> > >> write the signal number in base-10 ASCII to the kill file of the
> > >> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
> > >>
> > >> Semantically, /proc/pid/kill works like kill(2), except that the
> > >> process ID comes from the proc filesystem context instead of from an
> > >> explicit system call parameter. This way, it's possible to avoid races
> > >> between inspecting some aspect of a process and that process's PID
> > >> being reused for some other process.
> > >>
> > >> With /proc/pid/kill, it's possible to write a proper race-free and
> > >> safe pkill(1). An approximation follows. A real program might use
> > >> openat(2), having opened a process's /proc/pid directory explicitly,
> > >> with the directory file descriptor serving as a sort of "process
> > >> handle".
> > >
> > > How long does the 'inspection' procedure take? If its a short
> > > duration, then is PID reuse really an issue, I mean the PIDs are not
> > > reused until wrap around and the only reason this can be a problem is
> > > if you have the wrap around while the 'inspecting some aspect'
> > > procedure takes really long.
> >
> > It's a race. Would you make similar statements about a similar fix for
> > a race condition involving a mutex and a double-free just because the
> > race didn't crash most of the time? The issue I'm trying to fix here
> > is the same problem, one level higher up in the abstraction hierarchy.
> >
> > > Also the proc fs is typically not the right place for this. Some
> > > entries in proc are writeable, but those are for changing values of
> > > kernel data structures. The title of man proc(5) is "proc - process
> > > information pseudo-filesystem". So its "information" right?
> >
> > Why should userspace care whether a particular operation is "changing
> > [a] value[] of [a] kernel data structure" or something else? That
> > something in /proc is a struct field is an implementation detail. It's
> > the interface semantics that matters, and whether a particular
> > operation is achieved by changing a struct field or by making a
> > function call is irrelevant to userspace. Proc is a filesystem about
> > processes. Why shouldn't you be able to send a signal to a process via
> > proc? It's an operation involving processes.
> >
> > It's already possible to do things *to* processes via proc, e.g.,
> > adjust OOM killer scores. Proc filesystem file descriptors are
> > userspace references to kernel-side struct pid instances, and as such,
> > make good process handles. There are already "verb" files in procfs,
> > such as /proc/sys/vm/drop_caches and /proc/sysrq-trigger. Why not add
> > a kill "verb", especially if it closes a race that can't be closed
> > some other way?
> >
> > You could implement this interface as a system call that took a procfs
> > directory file descriptor, but relative to this proposal, it would be
> > all downside. Such a thing would act just the same way as
> > /pric/pid/kill, and wouldn't be usable from the shell or from programs
> > that didn't want to use syscall(2). (Since glibc isn't adding new
> > system call wrappers.) AFAIK, the only downside of having a "kill"
> > file is the need for a string-to-integer conversion, but compared to
> > process killing, integer parsing is insignificant.
> >
> > > IMO without a really good reason for this, it could really be a hard
> > > sell but the RFC was worth it anyway to discuss it ;-)
> >
> > The traditional unix process API is down there at level -10 of Rusty
> > Russel's old bad API scale: "It's impossible to get right". The races
> > in the current API are unavoidable. That most programs don't hit these
> > races most of the time doesn't mean that the race isn't present.
> >
> > We've moved to a model where we identify other system resources, like
> > DRM fences, locks, sockets, and everything else via file descriptors.
> > This change is a step toward using procfs file descriptors to work
> > with processes, which makes the system more regular and easier to
> > reason about. A clean API that's possible to use correctly is a
> > worthwhile project.
>
> So I have been disucssing a new process API With David Howells, Kees
> Cook and a few others and I am working on an RFC/proposal for this. It
> is partially inspired by the new mount API. So I would like to block
> this patch until then. I would like to get this right very much and I
> don't think this is the way to go. I hope to have a more detailed
> proposal out soon(ish). David and I were also thinking about an adhoc
> session at the kernel summit but we aren't clear whether there's still a
> slot.

It's also entertaining since I talked with Dylan Reid at Google about
this during {O,L}SS too. :)