Re: ext3 to include capabilities?

thospel@mail.dma.be
14 Apr 1999 00:37:43 -0000


Ok, here follows my opinion peace after following the discussion here.
I'll write it in the style of a proof (no, I don't pretend it IS a proof.
It's just that writing it like this forces me to make my assumptions
explicit, so if you disagree you can point at the place where my reasoning
diverges from yours and give explicit criticism instead of general handwaving)

Let's assume we want to see if we can
1) Make linux capability enable
2) Try to keep as much of the old world working as possible
(e.g. if possible it would be nice if tar, cp, nfs etc. keep working)
2') so we prefer capabilities to be orthogonal to what we already have.
3) It must be secure (what's the point otherwise ?)
4) Transitions between systems with and without capabilities (and back)
should be as painless as possible (This criterium is different from 2, and
not so important, but a nice extra)

(Yes, i know. 2) to a great degree causes all decisions, and many people
disagree with how important is is. That's why I call it an assumption.
I'll try to precede all my assumptions with a number)

Ok, reasoning:

We want an executable to have a number of extra properties, called
capabilities. Think of them as a number of extra bits about an executable
that have to be stored somewhere.

We must save these bits somewhere. we can choose:
A) The extra info is external to the file contents.

5) Consider now renaming a file. The name in UNIX is just a property, the
inode is in a sense the "real" thing, so according to 2 we would like
to keep this property (also think of linking and unlinking a file)

6) Also keep in the back of your mind cp, which means creating a new file,
giving it the contents of the old file and possibly setting the
properties of that file to be close to the old file. The ability to
cp (or tar) with capabilities is therefore essentially equivalent
to being able to make a file with capabilities. so only an entity that
can set a capability of a file should be the one who can inherit a
capability over a copy.

So we are led to the question of: how tightly are the capabilities
linked to the file (and are we linked to file or inode)
Suboptions:
Aa) Capabilities and file are weakly coupled (kernel doesn't bother)
Typical example is an non kernel controlled database describing
filenames and which capabilities they have.

- It's very easy to lose a capability when renaming a file.
- tar/nfs/cp won't work since they won't know where to get the
external capbility info.
- When starting up a file, kernel should get capability stuff from
this database. This seems to suggest the kernel calling user code
(yuck). Also leads to races between setting a cap and invoking
a cap (double yuck)
Nah, doesn't work very will. lets ignore this one.
Bb) Capabilities and files are tightly coupled (kernel guarentees the
coupling). This will make things like rename work as expected. Two
possibilities:
Bba) The coupling is external to the kernal. Since the kernel has
no control over this "other place" (or you again need kernel
calls userspace), it must be something invariant of the file.

Typical example: capabilities are in a database that couples
inodes to capabilities.
- essentially solves the race in Aa
- Solves the losing cap under mv
- At startup the kernel still has to get it from the database
(still needs kernel calls usercode, but only at exec time)
- Still doesn't work with tar/nfs/copy, since they don't know
that they have to look up/move this info
(notice that we can't use the standard UNIX attributes that these
apps DO send as extra info with the file contents since:
- There are more CAP bits than there are of this type
- MOST of them already have a meaning, so we can't just
steal them for our own purposes (breaks 2')

Bba already works better, but still looses a lot.
Bbb) The coupling is internal to the kernel.
This solves the "get CAPS at startup" problem, since the kernel
has them. Will still fail tar/nfs/cp

Still, let's look into this one a bit more since it's the one
that makes sense if we drop 2)

7) Now consider that we want caps to survive a reboot.
Bbba) The coupling does not live in the filesystem of the file.
So it's somewhere else, say in a small raw partition or
file.
8) We like things not to depend on what's mounted where
- Now if we move a disk to another machine, the file will
loose it CAPS there. That's not so bad. One could even
argue that's a good point.
- If we import/mount disk from another machine, we run the
risk of picking up a CAP. That is something we want to
avoid.
- Even if we don't move disks, but just mount things on
another point, we don't want weird gaining/loosing
CAPS.
So to make this work a cap should probably coupled to a
UUID/inode pair

It looks like in fact Bbba is a possible solution if you
drop 2). The main drawback seems to be the somewhat ugly
kernel controlled raw partition or file. It has some
interesting semantics though (you don't automatically
trust an externally imported disk). This might be an
alternative worth thinking about
Bbbb) The coupling lives in the same filesystem as the file.
Since with 5) we already identified the inode as the
thing that we want to have the CAPS, we can just as well
store them essentially there. This is what the capabilities
in filesystem people are basically doing
B) The extra info is internal to the file (executable) contents.
Now consider starting an executable. The kernel has to
- kernel load the file in memory
- (kernel maybe set some caps for the process)
- kernel gets the memory image going in a user process
- (possibly user process changes CAPS around a bit so we end up with the
intended set)
- Program does it's job

The question is, can we drop either step 2 or 4 here.
Step 4 already runs as user code, so that user could possibly already
start controlling his process. Since we don't want the user to control
what happens here, step 2 must at least have flagged the process to be
not completely under user control (no gdb attach e.g. during step 4)

Ba) we keep step 4
So execute is:
- kernel loads image
- kernel sets at least a "handsoff" bit.
- kernel sets process running
- user space shuffles some more CAPS (e.g. stuff in crt.o, or
first code in main).
- Program does it's job

9) We would like an administrator to in principle be able to go to
a file and list its caps and possibly change them with some
generic setcap program.

Baa) Step 4 is freely programmable by the user and can end up
almost anywhere in the code (e.g, it works with calls to some
mythical libcap)
A program can't fully analyze another program, so lscap and
setcap would be undependable.
=> Nope
Bab) The info is in fixed places in the executable, determined by
the file-layout.

e.g: a matched crt.o looks into the file being started, looks
in the special well known places and sets the right caps.

This of course has the same sort of problems as a #! line
And this can be solved the same way (kernel passes trusted
open filedescriptor on call).

I think it's possible to make something work along those lines.

You must be very carefull though that after the initialisation
phase the "normal run" phase can't be tricked into calling some
of this "step 4 fiddle" stuff again possibly with user chosen
date (the kind of suid problem we want to fix with CAPS). This
can be solved e.g with dropping the "fiddle CAP" after fiddling.

Notice that if we move the crt.o part into the kernel, we end up
with Bb (Bb will also introduce a special 1 bit attribute like
the handsoff bit), and don't need anti fiddle bits and special
startup magic. In fact, I don't see any real gain from Bab
relative to Bb, so let's also forget about this one.
Bb) we drop step 4
Exec is:
- kernel loads image
- kernel sets all caps (derived from file)
- kernel sets process running

Since we want to keep as much of old UNIX as possible, not every
file will be capability enhanced. Whether it is must be stored
somewhere (this is one bit of info).

3) implies that this bit can't live in the file itself, or the user
could just make an executable file with this bit, and gain caps.

Conclusion: There is an "enable caps" bit somewhere, and it's not
in the file contents.

Repeat the whole reasoning we did for "all cap bits" again, but you
are forbidden to end up in Bb (that would mean a flag to flag that
there are caps, and you would have to repeat the reasong again ->
infinite loop). The only thing there that depended on the number of
bits (the obly thing different), was the Bb part, where it was
concluded that the CAP bits can't be in the fileattributes since
there are too many (it was the basic reason to reject the Bb branch)

Conclusion: If we want nfs/cp/tar to work, the "hascap" bit must
be an existing unix file attribute (the only other info these
progs copy outside of the file contents)

This bit should NOT be user settable, or anybody can create their own
CAPS. So we might have to remove an ability to free up a bit. This
must however be a bit not in any use, or we are not othogonal (2')

ok, so we have:
- rwx bits.
The rw bits can have any value and we want caps to be
orthogonal (2'), so they are out.
- Remove x (that's one thing we know about an executable, it's
executable (ok, we don;t know WHICH x bit is set)).

But: - we can't distinguish a plain file that looks like an
executable from an executable anymore (we dropped that bit)
- User expects to be able to toggle the x bit on a plain file.
That would mean user toggling capenable. Yuck.
- Utilities like which work by looking for executable files
in a user path. They stop working, and we didn't want to
break common programs.
- suid:
That's a nice one, it already has a "kill by write semantic"
(which would also be for root in a CAPs system, but not e.g. for
someone with CAPSETCAP). It already has a much used property,
called setuid. Due to (2') we don't want to kill that.
- sgid: same problem
- socket/symbolic link/block dev/char dev/dir bits.
These are all bits belonging to something without content: useless
- regular file bit: well, alreay in use. It IS a regular file.
- High bits beyond _S_IFMT (all lower bits are in use)
We can't depend on there being any (16 bits are used). Even if
there were, your tar/cp/NFS might mask them
- atime/mtime/ctime: ok, they have potential. We could steal
the least significant bit of one of these (measure time dossy,
in 2 second units). Bit ugly and might cause a little bit of
breakage (people "KNOW" you can depend on a one sec granularity.
Some programs might even depend on it)
- uid/gid
Nope, they already have a meaning (together with the rwx bits they
determine all kind of access permissions, with the s bit they
determine target user, with g target group
- Sticky bit: Aha, bingo
It has historically already been used to mean something for
executables, so standard tools should be able to handle it in
either state. It's not used (for files, e.g. executables) in
linux.
Drawbacks:
- Currently a user controlleble attribute. That must go.
- Has a meaning on other unix systems, where it remains
settable. So e.g. you can't trust NFS mounts from other
places where it still is settable (squash_sticky anyone ?)
Still, it was the last available bit, and the drawbacks are
acceptable I think.

To recap:
- The sticky bit is the "hascaps" flag
- Normal users can't do chmod +t anymore. This is only for
an entity with CAPSETCAP
- If you have write permission on a file, writing automatically
removes the sticky bit (possibly unless you had CAPSETCAP)
- Executable layout is such that kernel can find the CAPs
datastructure at load time
- We could use the least significant bit of one of the file times
as an alternative

About up/downgrading:
- Going back to an older kernel is safe: -t bit looses meaning
and executable looses caps: fine !
- Going forward: Damn, you have to sweep the filesystem for
+t flags. Better not plan on doing this more than once.

Also notice that I haven't fixed a file format, but the hooks in
ELF for extra sections are...seductive.

Conclusion:
- It's possible to add CAPS keeping the old semantics as much as possible
- The price you pay is:
- You only want to move to CAPS once (each move involves a audit of
your filesystem. Maybe you can get away with just sweeping the
/home filesystem though)
- You must be more carefull about non-cap-linux remote filesystems
(we were already used to this with the +s bit and root ownership)
- We will never be able to go for the classic sticky bits semantics
anymore (do we care ?)

Let's go for it :-)

-- 
bumper sticker seen on stealth bomber:
"IF YOU CAN READ THIS, THEN WE WASTED 32 BILLION BUCKS."
.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/