RE: UID is to capabilities as SUID is to ??; access(), # CAPs?

From: Jesse Pollard (
Date: Sat May 27 2000 - 10:30:30 EST

On Fri, 26 May 2000, Linda Walsh wrote:
>> >I believe that the 'access' system call should check each pathname
>> >component to ensure the 'real uid' has at least 'x' for every pathname
>> >component except the last. (???)
>> Personally, I think this really calls for a different approach (a 2.5
>> implementation):
> Why? How does this differ from the current implementation (other
>than it using 'fsuid' -- which could be different? I.e. my, perhaps flawed,
>understanding is that this is pretty much how it currently is.

The current implementation makes a blind call to access with no parameters.
The parameters are global and accessed via the current process. That is
what makes it necessary to "save the ?uid, and restore the ?uid". I find this
an error prone, and limiting, method. It requires locks to prevent another
possibly parallel test (alternate CPU) from being processed on a different
file/different filesystem. The locks are placed on the process structure
which must be common to all tests. If the access function were passed the
necessary parameters, then these could be selected from the context of the
call, and not necessarily related to the process tree. One example of the
problems caused is NFS ownership protections - the daemon/kernel has to
switch uid/gid; this introduces the DoS attack by having the user kill the
daemon (and hence the need for a kernel version of NFS, which I personally
don't care for).

<rant on :-) >
I'm not fond of the current division of security control made between
the VFS and FS laysers anyway. I think the security control should be
more completely in a "reference monitor" (quotes intentional). The
security access control request should go to the reference monitor (not
the VFS/FS). Then the reference monitor can ask for the permissions/capability
entry for the object, perform the evaluation, and return the result.

The current FS "access" function does too much, and spreads the interpretation
of the security control around to each implementation of a FS. If there
is an error in any one of them, then the controls in all of them could be
bypassed (the same weakness in one applies to all). An example of this type
of failure is the protection bits on the dos file systems. If the dos FS has
setuid bits set, then I can get the setuid bit set on any file by just copying
it to that file system. Another is the NFS chown of a file - many times I have
disabled a users chown ability in a system (to prevent getting around quota
limits) only to have it still happen because NFS allowed it. A specific
        Cray systems traditionally do not allow mv to move directory names
                other than changing the name in the current directory.
        A NFS mount of that same file system does allow such activity.
(in case that isn't clear)
        a/b/c -> move directory c from directory b into directory a
                -> a/c
        This action is not allowed on Cray systems directly, but an
        NFS mount of that filesystem ("a") can perform the action.
<rant off>

> The problem comes in later with "Capabilities"...If I drop
>capabilities associated with a "su-cap" program and return to my
>previous set (which may be non-zero), I may still have CAP_DAC_READ_SEARCH
>to allow a lookup to succeed. Currently the access system call would
>seem to be flawed in that it unilaterally drops all caps -- not just the
>caps equivalent to an 'euid' that would have been set by a set[e]uid program.
> Second -- should access check against the effective or permitted
>set? Basically, it seems, access was invented before POSIX caps, so the
>paradigm is ill-defined.
> I could argue that access should check the user's effective set
>as it existed before they ran the sucap program. This would be consistent
>with the use of the 'capable' function as it is used today in the kernel.
>(side note: why is 'capable' defined in 'sched.h' and not 'capability.h'?)
> I could also argue (perhaps less convincingly) that because the
>user *could* raise their effective set to their permitted set, we should
>check the permitted set, but then what's the point of having an 'effective'
>set if we only check the permitted set. So I think that would be

The order (as I see it) should be:

   1. check MAC (when available..)
   2. check effective capability
   3. check DAC (the current access function)
   4. now do the exec or whatever.

The reason for the ordering of the 2/3 steps is to allow restriction of
a program by the DAC, yet still allow capability permissions to override
DAC. The user can set the effective capability list, then if the user really
wants to use the application, DAC can still be used to prevent it unless
the user has DAC_OVERRIDE capability - in which case the rule 3 would never
be invoked. This assumes that each rule is evaluated 1,2,3; unless a rule
forbids the action.

>> > 1) check access with the process's Effective Caps *before* the
>> >program loaded it's caps,
>> Ummm. How about something similar to "geteuid/seteuid" thing that some setuid
>> programs do now?
> The problem is not the call, but what data you would you return.
>Under the current paradigm, once you have executed su-cap program, how do
>you know what the original capabilities where?

Just as in the geteuid/getruid calls - each returns the corresponding set.
The process that is retrieving this data is the one existing AFTER the
exec - This allows the process to determine what specific privileges are
to be used for subsequent operations.

> Now I've heard some opinions that all the caps should be
>saved so a privileged app can completely restore the previous cap state,
>then fork/exec off a new program at the original cap-state (before the
>su-cap file was executed).
> Is this a desirable paradigm? For some reason I don't like it.
>What does it mean? Would it be optional? (*ick*) For example --
>lets say the new program now executes another su-cap program. That
>program has it's own inheritable, permitted and effective set. Do we
>want to merge those bits (with the previous inheritance rules) with the
>processes prior saved cap set or the process's current cap set? The
>results seem less than predictable if we were to allow such.

I don't see this as a reliable situation. Currently the process must open/setup
using the current real uid/gid. If the process is to then perform a privileged
action, the the r/e uid/gig is switched. The only program I can think of right
off that switches back is "su", and then it is usually to a different user.

> Another question -- how do we want to handle files with no capabilities
>sets. Does having no set mean 'no change', or does having no set mean
>the cap set is 'zero' (i.e. it is an unprivileged program). Again, different
>meanings/implications to either way. Which is cleaner?

None set in an inode means to use the inheritance rules. Otherwise what would
happen is that the user developed programs will get the maximum a user can
have; which then implies that the user effectively has no ability to turn
various capabilities off (effective/permitted capabilities can be different).

>> >
>> > These would be applied within the execX system call.
>> >
>> >I also anticipate the need for a 64 bit capability vector. Should we
>> >fold this in now? I can see (from POSIX 1003.1e DS15) some of the
>> >following needed caps (which is not to say this is an exhaustive list).
>> My opinion is definitly 64 bit (at a minimum). I do believe that some of them
>> should be set aside for site use (8-16 bits please). Applications could then use
>> these to determine subsets of actions that may be permitted in user space. The
>> formula for determining availability should be the same as the kernel supported
>> capabilities.
> So you want application specific capabilities in the kernel? Maybe a
>separate cap word for user caps and the kernel does the math on exec's and such.
>I'd rather not intermix them in the same 64-bit word.

Not implemented in the kernel, only the bits in the kernel (and everywhere the
capability bits are stored). This allows the site to use the evaluation of
capability bits, but the kernel would not implement the specific capability -
that would be up to the privileged application.

Now if the bits are kept in a separate word, or the same word is irrelevent
(other than to future expansion that is). The security functions are the same.

One way to seaprate them and still have them in the same word internally is to
give them to the process separately.

>> I think a configuration item would be a very good thing. In fact, a whole
>> new section ... "Extended Security Options" should be created to hold
>> capabilities. It could be the place to put MAC capability, perhaps even
>> IPSec options when these become available. It might even be reasonable
>> to put the firewall option here (or at least a referece to it).
> I currently have a 'Trust' section under General Ops. MAC is in there
>along with audit. I thought FS-capabilities (and ACL's) belonged under
>fs specific stuff. But MAC and audit and capabilities (if we option them)
>are spread over many places, so I put the first file (audit.c) in a subdir
>'trust' under the 'kernel' subdir.

The name is a bit vague, but the intent is good (how about "Trust Facility"?)

I personally like "Extended Security" because the facility will be difficult
to get "trusted" (which requires evaluation), but "Extended Security" may be
used without implying the facilities have been evaluated, and provide a "trusted
environment". Which they won't, unless the system is both configured, and
operated in a trusted facility manner. Based on the number of options the
actual security added will range from none, to a full trusted facility; all
depending on the combination of options from the extended security menu,
the network menu, the modules included (or excluded), drivers selected....

I don't think we would want to imply more security than is really present
(less is not good either...).


Any opinions expressed are solely my own.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:18 EST