Re: [RFC PATCH] seccomp: Add a trace action that allows the syscall if ptracer absent

From: Andy Lutomirski
Date: Sun Mar 22 2020 - 16:27:05 EST




> On Mar 21, 2020, at 11:20 PM, Keno Fischer <keno@xxxxxxxxxxxxxxxxxx> wrote:
>
> ï# Background
>
> One usecase for seccomp that has become more common is as a sort-of
> pre-filter for ptrace events. This is useful to reduce debugging
> overhead by avoiding traps to the ptracer for events that the
> ptracer is not interested in and is used extensively for this
> purpose in modern ptrace clients such as the rr debugger [1].

Hmm. My impression is that this patch takes an existing so-so trick and makes it a bit weirder and a bit less broken but still quite hackish.

I suggest a different solution: add something like PTRACE_BPF: it resumes the target and passes each event through a BPF program. The BPF program can decide whether to ignore the event or trap. Or log it!

Sadly, unprivileged eBPF may still be stalled, so you may have to fudge it with classic BPF, which is doable but more awkward and less powerful.

>
> As a simpler, but illustrative example, consider perhaps an
> expression like `strace -e open`. An advanced strace implementation
> could set a seccomp filter that causes a SECCOMP_RET_TRACE option
> for open system calls only (or even only system calls involving a
> particular file descriptor).
>
> However, when used for this purpose, the persistent and inherited
> nature of the seccomp filter can present a problem (though
> perfectly sensible from the perspective of a security feature of
> course). Of course the ptracer could simply attach itself to the
> entire process tree, but this of course presents its own performance
> challenges negating the originall performance advantage (e.g.
> in the `strace` example above, strace may only want to trace its
> original child).
>
> The most desirable feature would probably be for the ptracer to have
> the capability to remove a seccomp filter from a child, but I
> understand that the design of seccomp as an immutable tree of filters
> makes this difficult, not to mention that such a feature is perhaps
> undesirable given seccomp's intention as a security feature (even
> if guarded behind a non-default flag). This instead implements an
> additional trace action, that I believe can achieve much the same
> effect while being significantly less invasive:
>
> # This patch
>
> The default behavior of the `SECCOMP_RET_TRACE` is to fail the system
> call with `ENOSYS` if no ptracer is attached to the process. This
> patch adds a similar option `SECCOMP_RET_TRACE_ALLOW`, that behaves
> identically to `SECCOMP_RET_TRACE` in all respects, except that it
> allows the system call to go forward unmodified in the case that no
> ptracer is attached (or that the ptracer does not have the seccomp
> event enabled).
>
> This remedies the situation discussed in the previous section by making
> it irrelevant whether or not the seccomp filter is installed in
> the children of the original initial task, as they will not see any
> difference in behavior from the presence of a properly-implemented
> seccomp filter as long as no ptracer is attached.
>
> I think another possible use of an interface like this would be ptracers
> that want to occaisionally attach to running processes for profiling
> or logging purposes, but otherwise do not want to disturb the
> functioning of the process.
>
> It is not entirely satisfying of course, as the new behavior will
> be re-activated by any new ptracer attaching to such a process,
> even if such a ptracer is unaware of the presence of the seccomp filter.
> Such a situation is probably fairly rare (esp. since the alternative
> is just to keep every child ptrace-attached, which would prevent
> other ptracers from attaching anyway), but it deserves pointing out
> nonetheless.
>
> # Alternatives considered
>
> As discussed above, I considered proposing some mechanism for the
> removal of seccomp filters after the fact, but rejected this approach
> because it seemed to invasive and counter to the design intent of the
> API.
>
> Another idea I considered was allowing the BPF filter to access some
> ptracer-writable metadata area that could be used to modify the behavior
> of the filter after the fact (if enabled by a flag). I still think
> this would potentially be a viable solution. However, I worried that
> this would get too far into the realm of allowing eBPF seccomp filters,
> which as far as I understand was previously rejected (though I did
> not completely research the history of this suggestion). A solution
> like this would of course address the shortcoming I mentioned above
> by allowing the ptracer to render the filter truly inert.
>
> Lastly I considered instead adding a field to the seccomp_data
> structure that would indicate whether or not the ptracer was available
> or not. However, that approached seemed a lot more likely to result
> in user confusion and incorrect filters than simply adding an additional
> action.
>
> [1] https://github.com/mozilla/rr
>
> Signed-Off-By: Keno Fischer <keno@xxxxxxxxxxxxxxxxxx>
>
> ---
> .../userspace-api/seccomp_filter.rst | 5 ++++
> include/uapi/linux/seccomp.h | 1 +
> kernel/seccomp.c | 26 +++++++++++++------
> 3 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
> index bd9165241b6c..95098c20c3db 100644
> --- a/Documentation/userspace-api/seccomp_filter.rst
> +++ b/Documentation/userspace-api/seccomp_filter.rst
> @@ -151,6 +151,11 @@ In precedence order, they are:
> allow use of ptrace, even of other sandboxed processes, without
> extreme care; ptracers can use this mechanism to escape.)
>
> +``SECCOMP_RET_TRACE_ALLOW``:
> + Like ``SECCOMP_RET_TRACE``, but if no tracer is present (or the tracer is
> + not listening to seccomp events), allow the syscall rather than returning
> + ENOSYS.
> +
> ``SECCOMP_RET_LOG``:
> Results in the system call being executed after it is logged. This
> should be used by application developers to learn which syscalls their
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index be84d87f1f46..7d9997a98d06 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -39,6 +39,7 @@
> #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */
> #define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */
> #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */
> +#define SECCOMP_RET_TRACE_ALLOW 0x7ff30000U /* pass to a tracer or allow */
> #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */
> #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b6ea3dcb57bf..af7d38666b6c 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -623,6 +623,7 @@ static void seccomp_send_sigsys(int syscall, int reason)
> #define SECCOMP_LOG_LOG (1 << 5)
> #define SECCOMP_LOG_ALLOW (1 << 6)
> #define SECCOMP_LOG_USER_NOTIF (1 << 7)
> +#define SECCOMP_LOG_TRACE_ALLOW (1 << 8)
>
> static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
> SECCOMP_LOG_KILL_THREAD |
> @@ -630,6 +631,7 @@ static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
> SECCOMP_LOG_ERRNO |
> SECCOMP_LOG_USER_NOTIF |
> SECCOMP_LOG_TRACE |
> + SECCOMP_LOG_TRACE_ALLOW |
> SECCOMP_LOG_LOG;
>
> static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
> @@ -646,6 +648,7 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
> case SECCOMP_RET_ERRNO:
> log = requested && seccomp_actions_logged & SECCOMP_LOG_ERRNO;
> break;
> + case SECCOMP_RET_TRACE_ALLOW:
> case SECCOMP_RET_TRACE:
> log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE;
> break;
> @@ -832,6 +835,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> seccomp_send_sigsys(this_syscall, data);
> goto skip;
>
> + case SECCOMP_RET_TRACE_ALLOW:
> case SECCOMP_RET_TRACE:
> /* We've been put in this state by the ptracer already. */
> if (recheck_after_trace)
> @@ -839,9 +843,12 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>
> /* ENOSYS these calls if there is no tracer attached. */
> if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) {
> + if (action == SECCOMP_RET_TRACE_ALLOW)
> + return 0;
> +
> syscall_set_return_value(current,
> - task_pt_regs(current),
> - -ENOSYS, 0);
> + task_pt_regs(current),
> + -ENOSYS, 0);
> goto skip;
> }
>
> @@ -1374,6 +1381,7 @@ static long seccomp_get_action_avail(const char __user *uaction)
> case SECCOMP_RET_TRAP:
> case SECCOMP_RET_ERRNO:
> case SECCOMP_RET_USER_NOTIF:
> + case SECCOMP_RET_TRACE_ALLOW:
> case SECCOMP_RET_TRACE:
> case SECCOMP_RET_LOG:
> case SECCOMP_RET_ALLOW:
> @@ -1591,12 +1599,13 @@ long seccomp_get_metadata(struct task_struct *task,
> /* Human readable action names for friendly sysctl interaction */
> #define SECCOMP_RET_KILL_PROCESS_NAME "kill_process"
> #define SECCOMP_RET_KILL_THREAD_NAME "kill_thread"
> -#define SECCOMP_RET_TRAP_NAME "trap"
> -#define SECCOMP_RET_ERRNO_NAME "errno"
> -#define SECCOMP_RET_USER_NOTIF_NAME "user_notif"
> -#define SECCOMP_RET_TRACE_NAME "trace"
> -#define SECCOMP_RET_LOG_NAME "log"
> -#define SECCOMP_RET_ALLOW_NAME "allow"
> +#define SECCOMP_RET_TRAP_NAME "trap"
> +#define SECCOMP_RET_ERRNO_NAME "errno"
> +#define SECCOMP_RET_USER_NOTIF_NAME "user_notif"
> +#define SECCOMP_RET_TRACE_NAME "trace"
> +#define SECCOMP_RET_TRACE_ALLOW_NAME "trace_allow"
> +#define SECCOMP_RET_LOG_NAME "log"
> +#define SECCOMP_RET_ALLOW_NAME "allow"
>
> static const char seccomp_actions_avail[] =
> SECCOMP_RET_KILL_PROCESS_NAME " "
> @@ -1620,6 +1629,7 @@ static const struct seccomp_log_name seccomp_log_names[] = {
> { SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME },
> { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME },
> { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME },
> + { SECCOMP_LOG_TRACE_ALLOW, SECCOMP_RET_TRACE_ALLOW_NAME },
> { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME },
> { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
> { }
> --
> 2.24.0
>