[RFC PATCH] seccomp: Add a trace action that allows the syscall if ptracer absent

From: Keno Fischer
Date: Sun Mar 22 2020 - 02:29:12 EST


# Background

One usecase for seccomp that has become more common is as a sort-of
pre-filter for ptrace events. This is useful to reduce debugging
overhead by avoiding traps to the ptracer for events that the
ptracer is not interested in and is used extensively for this
purpose in modern ptrace clients such as the rr debugger [1].

As a simpler, but illustrative example, consider perhaps an
expression like `strace -e open`. An advanced strace implementation
could set a seccomp filter that causes a SECCOMP_RET_TRACE option
for open system calls only (or even only system calls involving a
particular file descriptor).

However, when used for this purpose, the persistent and inherited
nature of the seccomp filter can present a problem (though
perfectly sensible from the perspective of a security feature of
course). Of course the ptracer could simply attach itself to the
entire process tree, but this of course presents its own performance
challenges negating the originall performance advantage (e.g.
in the `strace` example above, strace may only want to trace its
original child).

The most desirable feature would probably be for the ptracer to have
the capability to remove a seccomp filter from a child, but I
understand that the design of seccomp as an immutable tree of filters
makes this difficult, not to mention that such a feature is perhaps
undesirable given seccomp's intention as a security feature (even
if guarded behind a non-default flag). This instead implements an
additional trace action, that I believe can achieve much the same
effect while being significantly less invasive:

# This patch

The default behavior of the `SECCOMP_RET_TRACE` is to fail the system
call with `ENOSYS` if no ptracer is attached to the process. This
patch adds a similar option `SECCOMP_RET_TRACE_ALLOW`, that behaves
identically to `SECCOMP_RET_TRACE` in all respects, except that it
allows the system call to go forward unmodified in the case that no
ptracer is attached (or that the ptracer does not have the seccomp
event enabled).

This remedies the situation discussed in the previous section by making
it irrelevant whether or not the seccomp filter is installed in
the children of the original initial task, as they will not see any
difference in behavior from the presence of a properly-implemented
seccomp filter as long as no ptracer is attached.

I think another possible use of an interface like this would be ptracers
that want to occaisionally attach to running processes for profiling
or logging purposes, but otherwise do not want to disturb the
functioning of the process.

It is not entirely satisfying of course, as the new behavior will
be re-activated by any new ptracer attaching to such a process,
even if such a ptracer is unaware of the presence of the seccomp filter.
Such a situation is probably fairly rare (esp. since the alternative
is just to keep every child ptrace-attached, which would prevent
other ptracers from attaching anyway), but it deserves pointing out
nonetheless.

# Alternatives considered

As discussed above, I considered proposing some mechanism for the
removal of seccomp filters after the fact, but rejected this approach
because it seemed to invasive and counter to the design intent of the
API.

Another idea I considered was allowing the BPF filter to access some
ptracer-writable metadata area that could be used to modify the behavior
of the filter after the fact (if enabled by a flag). I still think
this would potentially be a viable solution. However, I worried that
this would get too far into the realm of allowing eBPF seccomp filters,
which as far as I understand was previously rejected (though I did
not completely research the history of this suggestion). A solution
like this would of course address the shortcoming I mentioned above
by allowing the ptracer to render the filter truly inert.

Lastly I considered instead adding a field to the seccomp_data
structure that would indicate whether or not the ptracer was available
or not. However, that approached seemed a lot more likely to result
in user confusion and incorrect filters than simply adding an additional
action.

[1] https://github.com/mozilla/rr

Signed-Off-By: Keno Fischer <keno@xxxxxxxxxxxxxxxxxx>

---
.../userspace-api/seccomp_filter.rst | 5 ++++
include/uapi/linux/seccomp.h | 1 +
kernel/seccomp.c | 26 +++++++++++++------
3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
index bd9165241b6c..95098c20c3db 100644
--- a/Documentation/userspace-api/seccomp_filter.rst
+++ b/Documentation/userspace-api/seccomp_filter.rst
@@ -151,6 +151,11 @@ In precedence order, they are:
allow use of ptrace, even of other sandboxed processes, without
extreme care; ptracers can use this mechanism to escape.)

+``SECCOMP_RET_TRACE_ALLOW``:
+ Like ``SECCOMP_RET_TRACE``, but if no tracer is present (or the tracer is
+ not listening to seccomp events), allow the syscall rather than returning
+ ENOSYS.
+
``SECCOMP_RET_LOG``:
Results in the system call being executed after it is logged. This
should be used by application developers to learn which syscalls their
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index be84d87f1f46..7d9997a98d06 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -39,6 +39,7 @@
#define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */
#define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */
#define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */
+#define SECCOMP_RET_TRACE_ALLOW 0x7ff30000U /* pass to a tracer or allow */
#define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */
#define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b6ea3dcb57bf..af7d38666b6c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -623,6 +623,7 @@ static void seccomp_send_sigsys(int syscall, int reason)
#define SECCOMP_LOG_LOG (1 << 5)
#define SECCOMP_LOG_ALLOW (1 << 6)
#define SECCOMP_LOG_USER_NOTIF (1 << 7)
+#define SECCOMP_LOG_TRACE_ALLOW (1 << 8)

static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
SECCOMP_LOG_KILL_THREAD |
@@ -630,6 +631,7 @@ static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
SECCOMP_LOG_ERRNO |
SECCOMP_LOG_USER_NOTIF |
SECCOMP_LOG_TRACE |
+ SECCOMP_LOG_TRACE_ALLOW |
SECCOMP_LOG_LOG;

static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
@@ -646,6 +648,7 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
case SECCOMP_RET_ERRNO:
log = requested && seccomp_actions_logged & SECCOMP_LOG_ERRNO;
break;
+ case SECCOMP_RET_TRACE_ALLOW:
case SECCOMP_RET_TRACE:
log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE;
break;
@@ -832,6 +835,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
seccomp_send_sigsys(this_syscall, data);
goto skip;

+ case SECCOMP_RET_TRACE_ALLOW:
case SECCOMP_RET_TRACE:
/* We've been put in this state by the ptracer already. */
if (recheck_after_trace)
@@ -839,9 +843,12 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,

/* ENOSYS these calls if there is no tracer attached. */
if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) {
+ if (action == SECCOMP_RET_TRACE_ALLOW)
+ return 0;
+
syscall_set_return_value(current,
- task_pt_regs(current),
- -ENOSYS, 0);
+ task_pt_regs(current),
+ -ENOSYS, 0);
goto skip;
}

@@ -1374,6 +1381,7 @@ static long seccomp_get_action_avail(const char __user *uaction)
case SECCOMP_RET_TRAP:
case SECCOMP_RET_ERRNO:
case SECCOMP_RET_USER_NOTIF:
+ case SECCOMP_RET_TRACE_ALLOW:
case SECCOMP_RET_TRACE:
case SECCOMP_RET_LOG:
case SECCOMP_RET_ALLOW:
@@ -1591,12 +1599,13 @@ long seccomp_get_metadata(struct task_struct *task,
/* Human readable action names for friendly sysctl interaction */
#define SECCOMP_RET_KILL_PROCESS_NAME "kill_process"
#define SECCOMP_RET_KILL_THREAD_NAME "kill_thread"
-#define SECCOMP_RET_TRAP_NAME "trap"
-#define SECCOMP_RET_ERRNO_NAME "errno"
-#define SECCOMP_RET_USER_NOTIF_NAME "user_notif"
-#define SECCOMP_RET_TRACE_NAME "trace"
-#define SECCOMP_RET_LOG_NAME "log"
-#define SECCOMP_RET_ALLOW_NAME "allow"
+#define SECCOMP_RET_TRAP_NAME "trap"
+#define SECCOMP_RET_ERRNO_NAME "errno"
+#define SECCOMP_RET_USER_NOTIF_NAME "user_notif"
+#define SECCOMP_RET_TRACE_NAME "trace"
+#define SECCOMP_RET_TRACE_ALLOW_NAME "trace_allow"
+#define SECCOMP_RET_LOG_NAME "log"
+#define SECCOMP_RET_ALLOW_NAME "allow"

static const char seccomp_actions_avail[] =
SECCOMP_RET_KILL_PROCESS_NAME " "
@@ -1620,6 +1629,7 @@ static const struct seccomp_log_name seccomp_log_names[] = {
{ SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME },
{ SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME },
{ SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME },
+ { SECCOMP_LOG_TRACE_ALLOW, SECCOMP_RET_TRACE_ALLOW_NAME },
{ SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME },
{ SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
{ }
--
2.24.0