Re: devfs - the missing link

From: Alexander Viro (
Date: Fri May 12 2000 - 08:01:11 EST

On Thu, 11 May 2000, Neil Brown wrote:

> My one sentence summary:
> Device special files are *not* devices, they are gateways to
> devices.

No arguments here. However, there is one thing you are missing:
s/devices\./devices, which are provided by different parts of kernel./

> Before I embark on the elaboration, it might help to identify some
> particular issues that seem to have caused particular disagreement. I
> believe that the approach discussed below answers all of these issues
> to some degree. I'll let you be the judge.
> 1/ persistence of permissions on device files - not trivial when
> device files are not persistent. Several solutions have been
> discussed with no clear agreement.
> 2/ /dev in a chroot gaol. This requires a /dev which is the same
> as, but different too, the "real" /dev.
> 3/ 16 bit device numbers are too small. Do we enlarge them?
> Deprecate them? If so, how?
> 4/ Where and how is devfs mounted? /dev? /devices? at the same time
> as /? at the same time as /proc?
> 5/ The choice of names of things in devfs - the Linus imposed scheme
> vs the original scheme.

    6/ Relations between devices and other special files provided by the
same parts of kernel, e.g. procfs ones.
    7/ Dynamic adding and removing of such parts.

> B: A brief outline of what (I think) I would like a device
> filetree to look like.
> The traditional Unix device tree is clearly limiting. There are two
> particular aspects that are limiting.
> 1/ 3.5 level hierarchy is too rigid.
> 2/ numeric identifiers are hard to manage, and not human-friendly.
> The "obvious" response to this is to have a hierarchy that looks
> like a filesystem - with textual names for elements and arbitrarily
> many level as suits particular types of devices - and this is what
> devfs does.
> My reason for proposing something different to the current devfs
> structure is that I am coming to the problem with different
> priorities. devfs seems to want to copy the traditional layout of
> /dev, and with good reason. I have no desire to mimic that, but
> instead a desire to mimic the 3-level hierarchy of devices numbers -
> but take it a bit further.

Wait a minute. It's all nice and dandy, but you are missing a serious
point here - you are making all drivers to push their stuff into this tree
and then you have problems with restricting it. There is a good ol' way to
do such things - keep it in different trees and use mount. This part of
unified hierarchy (_who_ provides the thing) is missing in your variant.

> I think (hope) that you get the idea. The device tree reflects the
> physical organisation of devices where possible, and allows for
> "virtual" devices to help flatten the hierarchy. The tree contains
> not only devices, but also information about devices such as is
> often found in /proc.

And drivers become aware of the non-local structure in that tree. There
_is_ a good reason why absolute symlinks in packages are frowned upon and
it's mostly the same case.

> Just to bring you back to where we are up to, this hierarchy is NOT
> meant to replace /dev. It replaces the block-or-char/major/minor
> hierarchy. Like that hierarchy, it has little in the way of access
> control.
> Though the bc/major/minor hierarchy is not directly accessible from
> the filesystem, it would be nice if this hierarchy were. We could
> mount it somewhere like /devices. However I would prefer it to go
> somewhere like //devices or //linux/devices. It wouldn't get
> mounted there. It would simply always be there, much as / is always
> there and the bc/major/minor hierarchy is always ... wherever it
> is.
> It is true that linux doesn't currently differentiate // from /, but
> POSIX allows us too, and there has been talk about going that way,
> and we can keep that as a long term goal, and mount it in
> /proc/devices or similar for now.

        POSIX allows next to every kludge that had ever hit the fan in
any long-dead Missed'em'V variant. // is _ugly_. If you want to say
"namespaces" - just say it. There are much cleaner ways to do that thing
and basing the choice on "POSIX allows" is ridiculous.

> Obviously symlinks as they are don't cut it, but there are three
> bits (setuid, setgid, sticky) that we can use to enhance symlinks -
> and Unix has a (murky?) history of using these bits ... creatively.
> Let me propose that a symlink with, say, the setuid bit gets treated
> differently to symlinks, and somewhat like device special files.

[snip the horrible kludge]

Why on the Earth do you _want_ to keep them in filesystem? Why bother
bit-shuffling the inode fields in style that reeks with CP/M directories
and S-type object files? You _already_ have userland participating in the
setting your unified tree up. man 5 fstab for details.

> However, this structure may be a bit too limiting. Suppose that
> rather than giving away access to a specific device, I want to give
> away access to a directory full of devices. e.g. You can have
> access to any digital camera that gets plugged in. I really want to
> be able to have a devlink that points to a directory. What does
> that mean? In particular, how is the ACL carried along if I chdir
> through a devlink, and how am I prevented from using ".." to walk
> all over the device tree.
> The abstraction that seems to work best here is a "mount". When I
> access a devlink, particularly one to a directory, I want the
> directory to effectively be mounted on the symlink ... and with Al's
> new mount stuff there is no problem mounting different bits in
> different places, possibly mounting the one directory in several
> places. This provides control of "..", but what does it do for
> preserving the ACL?

Yeah... It's even easier if you drop these "devlinks" completely, put the
description of mounting to the place where it belongs and keep the trees
separate from the very beginning.

> Here I think we need one more bit of magic.
> Every object in the device tree will have ownership/ACL, though the
> ACLs will be chosen from a fairly limited set and almost everything
> will be owned by root. However, some objects, particularly devices,
> will have a sticky bit set. In the device filesystem, the sticky bit
> will have a special meaning. It means "use the owner/acl of the
> mountpoint". The mountpoint is always available through the
> vfsmount structure so getting hold of this should be quite easy.

> One thing that this doesn't answer is how symlinks inside the device
> filesystem get treated when you have only mounted part of it.
> Possibly these symlink need to be devlinks as well. I haven't
> completely resolved this issue for myself, but I don't think it is
> insurmountable.

> So, in summary, we have
> - devlinks which are symlinks with setuid bit set.
> - chown/chmod affect devlinks directly, not the target like
> with symlinks.
> - accessing a devlink does some sort of magic mount
> - in the device filesystem, "sticky" objects get their permissions
> from the mountpoint.
> - only root(CAP_MKNOD) can make devlinks.
> It might actually be nice to use devlinks more generally:
> /usr -> //devices/filesystem/ext2fs/long-hex-uuid/fs
> obviates some of the need for /etc/fstab.

Oh, lovely. So what are you going to do if I want to have one set of
mountpoints for "web design group" lusers and completely different one -
for PR wankers? Keeping two files is trivial. Two sets of symlink bodies...
Thank you very much, I'll pass this one.

        There is one good point, though - ability to have default object
attributes inherited from the mountpoint. _That_ is a very good idea - as
the matter of fact we already have that for many filesystems. However, I
don't believe that it should be determined by normal inode attributes.
IOW, I don't believe that it should be available for files on the
filesystems that are perfectly able to deal with the per-file attributes
without any external help. Filesystem driver should be able to tell which
inode attributes are inherited _during the read_inode()_. That way you can
have drivers populating their own trees and marking the "inheritable"
inodes as such. Then we can mount such tree wherever needed with whatever
defaults we want. Notice that we may have _any_ tree structure - we can
always bind the object at any place we like.

        As for the automounting - d'oh. Add the mount trap as the last
component into /dev. Period. It will be triggered if lookup in other
components fails and only then. And leave the usual device nodes alone -
they are very happy as they are.

        ASCII files are good. Random magic is bad. Bit-stuffing and
overloading existing semantics is _always_ bad.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Mon May 15 2000 - 21:00:20 EST