Re: [REGRESSION] x86/entry: Tracer no longer has opportunity to change the syscall number at entry via orig_ax

From: Kees Cook
Date: Fri Sep 11 2020 - 14:58:45 EST


On Wed, Sep 09, 2020 at 11:53:42PM +1000, Michael Ellerman wrote:
> Hi Thomas,
>
> Sorry if this was discussed already somewhere, but I didn't see anything ...
>
> Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:
> > On Wed, Aug 19 2020 at 10:14, Kyle Huey wrote:
> >> tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer
> >> modifications to orig_ax in a syscall entry trace stop are not honored
> >> and this breaks our code.
> ...
> > diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> > index 9852e0d62d95..fcae019158ca 100644
> > --- a/kernel/entry/common.c
> > +++ b/kernel/entry/common.c
> > @@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>
> Adding context:
>
> /* Do seccomp after ptrace, to catch any tracer changes. */
> if (ti_work & _TIF_SECCOMP) {
> ret = __secure_computing(NULL);
> if (ret == -1L)
> return ret;
> }
>
> if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
> trace_sys_enter(regs, syscall);
>
> > syscall_enter_audit(regs, syscall);
> >
> > - return ret ? : syscall;
> > + /* The above might have changed the syscall number */
> > + return ret ? : syscall_get_nr(current, regs);
> > }
> >
> > noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
>
> I noticed if the syscall number is changed by seccomp/ptrace, the
> original syscall number is still passed to trace_sys_enter() and audit.
>
> The old code used regs->orig_ax, so any change to the syscall number
> would be seen by the tracepoint and audit.

Ah! That's no good.

> I can observe the difference between v5.8 and mainline, using the
> raw_syscall trace event and running the seccomp_bpf selftest which turns
> a getpid (39) into a getppid (110).
>
> With v5.8 we see getppid on entry and exit:
>
> seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (7ffff22c46e0, 40a350, 4, fffffffffffff7ab, 7fa6ee0d4010, 0)
> seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304
>
> Whereas on mainline we see an enter for getpid and an exit for getppid:
>
> seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (7ffe2f6d1ad0, 40a350, 7ffe2f6d1ad0, 0, 0, 407299)
> seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027
>
>
> I don't know audit that well, but I think it saves the syscall number on
> entry eg. in __audit_syscall_entry(). So it will record the wrong
> syscall happening in this case I think.
>
> Seems like we should reload the syscall number before calling
> trace_sys_enter() & audit ?

Agreed. I wonder what the best way to build a regression test for this
is... hmmm.

--
Kees Cook