Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

From: Tycho Andersen
Date: Tue Oct 30 2018 - 11:32:40 EST


Hi Oleg,

On Tue, Oct 30, 2018 at 03:32:36PM +0100, Oleg Nesterov wrote:
> On 10/29, Tycho Andersen wrote:
> >
> > + /* This is where we wait for a reply from userspace. */
> > + err = wait_for_completion_interruptible(&n.ready);
> > + mutex_lock(&match->notify_lock);
> > +
> > + /*
> > + * If the noticiation fd died before we re-acquired the lock, we still
> > + * give -ENOSYS.
> > + */
> > + if (!match->notif)
> > + goto remove_list;
> > +
> > + /*
> > + * Here it's possible we got a signal and then had to wait on the mutex
> > + * while the reply was sent, so let's be sure there wasn't a response
> > + * in the meantime.
> > + */
> > + if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
> > + /*
> > + * We got a signal. Let's tell userspace about it (potentially
> > + * again, if we had already notified them about the first one).
> > + */
> > + n.signaled = true;
> > + if (n.state == SECCOMP_NOTIFY_SENT) {
> > + n.state = SECCOMP_NOTIFY_INIT;
> > + up(&match->notif->request);
> > + }
>
> I am not sure I understand the value of signaled/SECCOMP_NOTIF_FLAG_SIGNALED...
> I mean, why it is actually useful?
>
> Sorry if this was already discussed.

:) no problem, many people have complained about this. This is an
implementation of Andy's suggestion here:
https://lkml.org/lkml/2018/3/15/1122

You can see some more detailed discussion here:
https://lkml.org/lkml/2018/9/21/138

> > + wake_up_poll(&match->notif->wqh, EPOLLIN | EPOLLRDNORM);
> > +
> > + mutex_unlock(&match->notify_lock);
> > + err = wait_for_completion_killable(&n.ready);
> > + mutex_lock(&match->notify_lock);
>
> And it seems that SECCOMP_NOTIF_FLAG_SIGNALED is the only reason why
> seccomp_do_user_notification() doesn't do wait_for_completion_killable() from
> the very beginning.
>
> But my main concern is that either way wait_for_completion_killable() allows
> to trivially create a process which doesn't react to SIGSTOP, not good...
>
> Note also that this can happen if, say, both the tracer and tracee run in the
> same process group and SIGSTOP is sent to their pgid, if the tracer gets the
> signal first the tracee won't stop.
>
> Of freezer. try_to_freeze_tasks() can fail if it freezes the tracer before
> it does SECCOMP_IOCTL_NOTIF_SEND.

I think in general the way this is intended to be used these things
wouldn't happen. Of course, it would be pretty easy for someone who
was malicious and had the ability to create a user namespace to
exhaust pids this way, so perhaps we should drop this part of the
patch. I have no real need for it, but perhaps Andy can elaborate?

Tycho