Re: [PATCH v3 1/2] capabilities: Ambient capabilities

From: Andy Lutomirski
Date: Tue Jun 09 2015 - 20:01:28 EST


On Tue, Jun 9, 2015 at 4:09 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Wed, May 27, 2015 at 4:47 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>> Credit where credit is due: this idea comes from Christoph Lameter
>> with a lot of valuable input from Serge Hallyn. This patch is
>> heavily based on Christoph's patch.
>>
>> ===== The status quo =====
>>
>> On Linux, there are a number of capabilities defined by the kernel.
>> To perform various privileged tasks, processes can wield
>> capabilities that they hold.
>>
>> Each task has four capability masks: effective (pE), permitted (pP),
>> inheritable (pI), and a bounding set (X). When the kernel checks
>> for a capability, it checks pE. The other capability masks serve to
>> modify what capabilities can be in pE.
>>
>> Any task can remove capabilities from pE, pP, or pI at any time. If
>> a task has a capability in pP, it can add that capability to pE
>> and/or pI. If a task has CAP_SETPCAP, then it can add any
>> capability to pI, and it can remove capabilities from X.
>>
>> Tasks are not the only things that can have capabilities; files can
>> also have capabilities. A file can have no capabilty information at
>> all [1]. If a file has capability information, then it has a
>> permitted mask (fP) and an inheritable mask (fI) as well as a single
>> effective bit (fE) [2]. File capabilities modify the capabilities
>> of tasks that execve(2) them.
>>
>> A task that successfully calls execve has its capabilities modified
>> for the file ultimately being excecuted (i.e. the binary itself if
>> that binary is ELF or for the interpreter if the binary is a
>> script.) [3] In the capability evolution rules, for each mask Z, pZ
>> represents the old value and pZ' represents the new value. The
>> rules are:
>>
>> pP' = (X & fP) | (pI & fI)
>> pI' = pI
>> pE' = (fE ? pP' : 0)
>> X is unchanged
>>
>> For setuid binaries, fP, fI, and fE are modified by a moderately
>> complicated set of rules that emulate POSIX behavior. Similarly, if
>> euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
>> (primary, fP and fI usually end up being the full set). For nonroot
>> users executing binaries with neither setuid nor file caps, fI and
>> fP are empty and fE is false.
>>
>> As an extra complication, if you execute a process as nonroot and fE
>> is set, then the "secure exec" rules are in effect: AT_SECURE gets
>> set, LD_PRELOAD doesn't work, etc.
>>
>> This is rather messy. We've learned that making any changes is
>> dangerous, though: if a new kernel version allows an unprivileged
>> program to change its security state in a way that persists cross
>> execution of a setuid program or a program with file caps, this
>> persistent state is surprisingly likely to allow setuid or
>> file-capped programs to be exploited for privilege escalation.
>>
>> ===== The problem =====
>>
>> Capability inheritance is basically useless.
>>
>> If you aren't root and you execute an ordinary binary, fI is zero,
>> so your capabilities have no effect whatsoever on pP'. This means
>> that you can't usefully execute a helper process or a shell command
>> with elevated capabilities if you aren't root.
>>
>> On current kernels, you can sort of work around this by setting fI
>> to the full set for most or all non-setuid executable files. This
>> causes pP' = pI for nonroot, and inheritance works. No one does
>> this because it's a PITA and it isn't even supported on most
>> filesystems.
>>
>> If you try this, you'll discover that every nonroot program ends up
>> with secure exec rules, breaking many things.
>>
>> This is a problem that has bitten many people who have tried to use
>> capabilities for anything useful.
>>
>> ===== The proposed change =====
>>
>> This patch adds a fifth capability mask called the ambient mask
>> (pA). pA does what most people expect pI to do.
>>
>> pA obeys the invariant that no bit can ever be set in pA if it is
>> not set in both pP and pI. Dropping a bit from pP or pI drops that
>> bit from pA. This ensures that existing programs that try to drop
>> capabilities still do so, with a complication. Because capability
>> inheritance is so broken, setting KEEPCAPS, using setresuid to
>> switch to nonroot uids, and then calling execve effectively drops
>> capabilities. Therefore, setresuid from root to nonroot
>> conditionally clears pA unless SECBIT_NO_SETUID_FIXUP is set.
>> Processes that don't like this can re-add bits to pA afterwards.
>>
>> The capability evolution rules are changed:
>>
>> pA' = (file caps or setuid or setgid ? 0 : pA)
>> pP' = (X & fP) | (pI & fI) | pA'
>> pI' = pI
>> pE' = (fE ? pP' : pA')
>> X is unchanged
>>
>> If you are nonroot but you have a capability, you can add it to pA.
>> If you do so, your children get that capability in pA, pP, and pE.
>> For example, you can set pA = CAP_NET_BIND_SERVICE, and your
>> children can automatically bind low-numbered ports. Hallelujah!
>
> Chrome OS could use this right now. :)
>
>> Unprivileged users can create user namespaces, map themselves to a
>> nonzero uid, and create both privileged (relative to their
>> namespace) and unprivileged process trees. This is currently more
>> or less impossible. Hallelujah!
>>
>> You cannot use pA to try to subvert a setuid, setgid, or file-capped
>> program: if you execute any such program, pA gets cleared and the
>> resulting evolution rules are unchanged by this patch.
>>
>> Users with nonzero pA are unlikely to unintentionally leak that
>> capability. If they run programs that try to drop privileges,
>> dropping privileges will still work.
>>
>> It's worth noting that the degree of paranoia in this patch could
>> possibly be reduced without causing serious problems. Specifically,
>> if we allowed pA to persist across executing non-pA-aware setuid
>> binaries and across setresuid, then, naively, the only capabilities
>> that could leak as a result would be the capabilities in pA, and any
>> attacker *already* has those capabilities. This would make me
>> nervous, though -- setuid binaries that tried to privilege-separate
>> might fail to do so, and putting CAP_DAC_READ_SEARCH or
>> CAP_DAC_OVERRIDE into pA could have unexpected side effects.
>> (Whether these unexpected side effects would be exploitable is an
>> open question.) I've therefore taken the more paranoid route. We
>> can revisit this later.
>
> I think this is correct. Stuff using file caps, or set*id bits are
> fundamentally using a different privilege management model. Keeping pA
> separate makes a lot of sense to me.
>
>> An alternative would be to require PR_SET_NO_NEW_PRIVS before
>> setting ambient capabilities. I think that this would be annoying
>> and would make granting otherwise unprivileged users minor ambient
>> capabilities (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much
>> less useful than it is with this patch.
>
> Agreed: we should keep nnp out of this.
>
>> ===== Footnotes =====
>>
>> [1] Files that are missing the "security.capability" xattr or that
>> have unrecognized values for that xattr end up with has_cap set to
>> false. The code that does that appears to be complicated for no
>> good reason.
>
> Would it make more sense to have has_cap true, but have it lack any actual caps?

I assume you're referring to the case where we fail to parse the
xattr. If so, I don't really know if or when this happens. Should
that be addressed separately from this patch set?

>
>> [2] The libcap capability mask parsers and formatters are
>> dangerously misleading and the documentation is flat-out wrong. fE
>> is *not* a mask; it's a single bit. This has probably confused
>> every single person who has tried to use file capabilities.
>
> Sounds like it would be a valuable documentation patch.

I'll try. Let's get the current thing done first.

>
>> [3] Linux very confusingly processes both the script and the
>> interpreter if applicable, for reasons that elude me. The results
>> from thinking about a script's file capabilities and/or setuid bits
>> are mostly discarded.
>
> I wonder if this is important enough to fix?

Not sure.

However, the fact that AFAICT LSM due to a script (as opposed to an
interpreter) is preserved sounds rather dangerous to me. I'm not sure
whether we can safely fix that at this point.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/