Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF

From: Jamie Lokier
Date: Thu Jan 12 2012 - 13:00:40 EST


Will Drewry wrote:
> On Thu, Jan 12, 2012 at 9:43 AM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > On Wed, 2012-01-11 at 11:25 -0600, Will Drewry wrote:
> >
> >> Filter programs may _only_ cross the execve(2) barrier if last filter
> >> program was attached by a task with CAP_SYS_ADMIN capabilities in its
> >> user namespace.  Once a task-local filter program is attached from a
> >> process without privileges, execve will fail.  This ensures that only
> >> privileged parent task can affect its privileged children (e.g., setuid
> >> binary).
> >
> > This means that a non privileged user can not run another program with
> > limited features? How would a process exec another program and filter
> > it? I would assume that the filter would need to be attached first and
> > then the execv() would be performed. But after the filter is attached,
> > the execv is prevented?
>
> Yeah - it means tasks can filter themselves, but not each other.
> However, you can inject a filter for any dynamically linked executable
> using LD_PRELOAD.
>
> > Maybe I don't understand this correctly.
>
> You're right on. This was to ensure that one process didn't cause
> crazy behavior in another. I think Alan has a better proposal than
> mine below. (Goes back to catching up.)

You can already use ptrace() to cause crazy behaviour in another
process, including modifying registers arbitrarily at syscall entry
and exit, aborting and emulating syscalls.

ptrace() is quite slow and it would be really nice to speed it up,
especially for trapping a small subset of syscalls, or limiting some
kinds of access to some file descriptors, while everything else runs
at normal speed.

Speeding up ptrace() with BPF filters would be a really nice. Not
that I like ptrace(), but sometimes it's the only thing you can rely on.

LD_PRELOAD and code running in the target process address space can't
always be trusted in some contexts (e.g. the target process may modify
the tracing code or its data); whereas ptrace() is pretty complete and
reliable, if ugly.

There's already a security model around who can use ptrace(); speeding
it up needn't break that.

If we'd had BPF ptrace in the first place, SECCOMP wouldn't have been
needed as userspace could have done it, with exactly the restrictions
it wants. Google's NaCl comes to mind as a potential user.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/