Fwd: Re: Capabilities

Andrew Morgan (morgan@transmeta.com)
Sat, 27 Jun 1998 17:21:50 -0700


[Cc:d to linux-kernel since I think it is of general interest...
Others can find libcap here:

ftp://linux.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.1/

the latest version of libcap should work with the most recent kernel.]

Winfried Truemper writes:
> Hi Andrew,
>
> could you try to explain the usage of execcap from libcap?
> If I want to make my (non-setuid-0) nameserver be able to
> bind to port 53, do I have to issue
>
> execcap CAP_NET_BIND_SERVICE su named /usr/sbin/named
>
> to make it run as user "named"?

As it stands, with no filesystem support for capabilities, you cannot
do this. Leveraging the "backward compatibiliy" code in the kernel
(which basically comes down to using the uid=0 to provide an effective
filesystem "Exec" capability) what you can do is the following:

[root@godzilla progs]# ./execcap cap_net_bind_service=i sleep 1000 &
[1] 600
[root@godzilla progs]# cat /proc/600/status
Name: sleep
State: S (sleeping)
Pid: 600
PPid: 587
Uid: 0 0 0 0
Gid: 0 0 0 0
VmSize: 872 kB
VmLck: 0 kB
VmRSS: 284 kB
VmData: 204 kB
VmStk: 8 kB
VmExe: 20 kB
VmLib: 608 kB
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000080000000
SigCat: 0000000000002000
CapInh: 0000000000000400
CapPrm: 0000000000000400
CapEff: 0000000000000400

Thus, named (here, I use sleep) inherits an interesting state:

* it is run as root (it is "only" able to manipulate files
owned by the root user).

* it has no privilege beyond binding to a port below 1024
(In other words, it can't change firewall rules, delete
files root does not own, change its uid, mount
filesystems...)

This is arguably more secure than a more traditional unix system,
where being able to bind to a port below 1024 carried with it rights
to do other dangerous things with "superuser privilege".

However, it is not a defensibly secure state....

> How secure is this with regard to the coarse "bind to everything below
> 1024" permission? In other words: is it planed to have capabilities
> for single ports? This would take 10 bits from the 128.

You could implement this with 10 extra bits, but it does not scale
well: how would you enforce the ability to bind to two and only two
ports below 1024? Or three? ...

All we have done is attempt to limit the extent that damage is doable
by the named process (and its decendents). Unfortunately, being able
to bind to an arbitrary port, or delete a file owned by root, which
are privileges still possessed by named (or sleep here), are extremely
dangerous on the average Linux system... [From your question, it is
clear that you appreciate this fact.]

To alleviate the latter concern, we could place the named process in a
chroot cell and avoid having any files there owned by root. (Make them
all owned by 'bin' or some other non-interactive user).
Unfortunately, the ability to bind to an arbitrary port gives an
attacker a path to hijack another port and thus intercept passwords
etc., from incoming telnet connections. Armed with that sort
of information, the attacker would be able to enter the system by a
more profitable route.

So what can you do?

The three 'progs' in the libcap distribution are intended purely as
(mostly useful) examples. If I had the time, I would create a patch
for named that does the following:

main()
{
bind_to_port;
+ set_eip_capabilities_to_0;
/* do what named does... */
}

Having done this, I would invoke named with an execcap+chroot wrapper
and limit the window of opportunity for manipulating named for evil
purposes to the distance from execcap's exec("named") to /* do what
named does */. Since named does not establish any network connections
within this window, it is reasonable to assume that you are only left
with the worry of making sure there are no files owned by root in your
chroot sub-system.

The bottom line is that execcap can only be used to enhance the
security of your system. With filesystem capabilities (not currently
implemented) you would be able to invoke named as a different user and
avoid the need for the execcap-wrapper altogether.

Limiting access to individual ports, from exec() is probably better
handled with port access control lists... Since I have not followed
this project, if I were you, I would follow up on the comment in
linux/Documentation/ioctl-number.txt (search for 'Port ACL').

I hope that helped.

Cheers

Andrew

PS. Just for fun, as root on a recent kernel, try:

execcap cap_net_bind_service=i su nobody -c "sleep 1000"

I think you'll discover it fails. ;^)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu