Re: [seccomp] Request for a "enable on execve" mode for Seccomp filters

From: Andy Lutomirski
Date: Wed Oct 28 2020 - 20:04:12 EST


On Wed, Oct 28, 2020 at 3:47 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Wed, Oct 28, 2020 at 12:18:47PM +0100, Camille Mougey wrote:
> > (This is my first message to the kernel list, I hope I'm doing it right)
>
> 1- self-confinement
> 2- launching external processes
> a) cooperating
> b) oblivious

I remain quite unconvinced that delayed filters will solve a real
problem. As you described, 2a could just confine itself. There's an
obvious synchronization point -- sd_notify(). I bet sd_notify() could
be rigged up to apply externally-supplied filters, or sd_notify()
could interact with user notifiers to get some assistance.

2b is nasty. In an ideal world, we would materialize a fully formed
process with filters installed. The problem is that processes don't
generally come fully formed. Almost all interesting processes are
dynamically linked, and they get to specify their own dynamic linkers.
Even if we limit ourselves to a known dynamic linker, we would want to
make sure that the dynamic linker is hardened against various escape
techniques. For dynamic linking, we would probably want to start out
with one set of privileges (loading libraries) and then switch.

I have an alternative suggestion to try to address some of the above:
allow a notifier to run in a mode in which it can replace the BPF
program outright. This would be something like:

if (fork() != 0)
return; // do parent stuff

// Start up. Set a BPF program that directs pretty much everything at
the listener.
int fd = seccomp(..., SECCOMP_FILTER_FLAG_NEW_LISTENER |
SECCOMP_FILTER_FLAG_ALLOW_REPLACEMENT, ...);

// Set up other things if needed.

execve();

Now, in the parent, once the child is ready for its final filters:

// Replace the filter on *all* processes using the filter to which
we're attached.
// I think the locking for this should be straightforward.
// Optional flag here to remove the ALLOW_REPLACEMENT flag, but it's
not really necessary
// since we're about to close() the listener.
ioctl(fd, SECCOMP_IOCTL_NOTIF_REPLACE_FILTER, new_filter);

// Call recv in a loop to drain and handle notifications.
for (...) {
ioctl(fd, SECCOMP_IOCTL_NOTIF_RECV, ...);
...
}

close(fd);

And now we're done. We can make the synchronization point be anything we like.


What do you all think? For people who really want
delay-until-execve(), this can emulate it efficiently.