Re: [PATCH] make TIOCSTI ioctl require CAP_SYS_ADMIN

From: Matt Brown
Date: Sat Apr 22 2017 - 13:10:33 EST


On 04/21/2017 01:24 AM, Serge E. Hallyn wrote:
On Fri, Apr 21, 2017 at 01:09:59AM -0400, Matt Brown wrote:
On 04/20/2017 01:41 PM, Serge E. Hallyn wrote:
Quoting matt@xxxxxxxxx (matt@xxxxxxxxx):
On 2017-04-20 11:19, Serge E. Hallyn wrote:
Quoting Matt Brown (matt@xxxxxxxxx):
On 04/19/2017 07:53 PM, Serge E. Hallyn wrote:
Quoting Matt Brown (matt@xxxxxxxxx):
On 04/19/2017 12:58 AM, Serge E. Hallyn wrote:
On Tue, Apr 18, 2017 at 11:45:26PM -0400, Matt Brown wrote:
This patch reproduces GRKERNSEC_HARDEN_TTY functionality from the grsecurity
project in-kernel.

This will create the Kconfig SECURITY_TIOCSTI_RESTRICT and the corresponding
sysctl kernel.tiocsti_restrict that, when activated, restrict all TIOCSTI
ioctl calls from non CAP_SYS_ADMIN users.

Possible effects on userland:

There could be a few user programs that would be effected by this
change.
See: <https://codesearch.debian.net/search?q=ioctl%5C%28.*TIOCSTI>
notable programs are: agetty, csh, xemacs and tcsh

However, I still believe that this change is worth it given that the
Kconfig defaults to n. This will be a feature that is turned on for the

It's not worthless, but note that for instance before this was fixed
in lxc, this patch would not have helped with escapes from privileged
containers.


I assume you are talking about this CVE:
https://bugzilla.redhat.com/show_bug.cgi?id=1411256

In retrospect, is there any way that an escape from a privileged
container with the this bug could have been prevented?

I don't know, that's what I was probing for. Detecting that the pgrp
or session - heck, the pid namespace - has changed would seem like a
good indicator that it shouldn't be able to push.


pgrp and session won't do because in the case we are discussing
current->signal->tty is the same as tty.

This is the current check that is already in place:
| if ((current->signal->tty != tty) && !capable(CAP_SYS_ADMIN))
| return -EPERM;

Yeah...

The only thing I could find to detect the tty message coming from a
container is as follows:
| task_active_pid_ns(current)->level

This will be zero when run on the host, but 1 when run inside a
container. However this is very much a hack and could probably break
some userland stuff where there are multiple levels of namespaces.

Yes. This is also however why I don't like the current patch, because
capable() will never be true in a container, so nested containers
break.


What do you mean by "capable() will never be true in a container"?
My understanding
is that if a container is given CAP_SYS_ADMIN then
capable(CAP_SYS_ADMIN) will return
true?

No, capable(X) checks for X with respect to the initial user namespace.
So for root-owned containers it will be true, but containers running in
non-initial user namespaces cannot pass that check.

To check for privilege with respect to another user namespace, you need
to use ns_capable. But for that you need a user_ns to target.


How about: ns_capable(current_user_ns(),CAP_SYS_ADMIN) ?

current_user_ns() was found in include/linux/cred.h

Any user can create a new user namespace and pass the above check. What we
want is to find the user namespace which opened the tty.


I believe I have a working solution that I can show in the next version
of the patch later today, but I just want to run the logic by you first.

I added: "struct user_namespace *owner_user_ns;" as a field in
tty_struct (include/linux/tty.h) Note: I am totally open to suggestions
for a better name.

Then I added "tty->owner_user_ns = current_user_ns();" to the
alloc_tty_struct function. (drivers/tty/tty_io.c)

When testing with a docker container, running in a different user
namespace, I printed out current_user_ns()->level, which returned 1,
and tty->owner_user_ns->level, which returned 0. This seems to prove
that I am correctly storing the user namespace which opened the tty.

Please let me know if there are any edge cases that I am missing with
this approach.