Some ideas

Jamie Lokier (
Thu, 24 Apr 1997 01:44:38 +0100

Some ideas...

1. How about compiling modules as ELF shared objects, instead of
relocatable files? This has a few advantages:

* Module dependencies can be encoded in the modules themselves.

* Missing symbols can be determined at link time, not when the
modules are loaded.

* Loading an ELF shared object is very simple; the format of the
dynamic relocation info is designed to be particularly easy
to parse (especially if you don't care to check it). So simple
in fact, that it could be incorporated into the kernel if that
is useful (e.g., with romfs).

* Symbol versioning can be used to restrict the set of symbols
exported by a module, much like `register_symtab' does now.

Note that shared objects don't need to be compiled with `-fPIC', so this
incurs no run-time performance hit. (`-fPIC' is just used to make the
image share better between processes, but this is not an issue for
kernel code).

Also, although normal shared libraries retain the symbol and relocation
tables in memory, this is not at all necessary.

2. While we're here, how about using constructor/destructor functions
instead of `init_module' and `cleanup_module'?

3. It may be possible to arrange for /proc/kcore to include an ELF DT_DEBUG
tag pointing at the list of loaded modules, in the same way as a
dynamic linker does it. Then `gdb vmlinux /proc/kcore' could
automatically load the symbol tables for currently loaded modules at the
right locations. <link.h> defines the structures for this.

4. This one is really adventurous. How about arranging a (non-module)
shared library such that it runs either (a) with special privileges
(e.g., as "root"), or (b) in kernel space. In both cases, the idea
is that it is callable from user space just like a normal shared
library, but the calls to the kernel/privileged part are fixed up to
jump through a call gate or syscall to do the privilege switch. (And
so are the returns).

This might:

* Remove the need for many ioctls, replacing them with typed
function calls.

* Remove the need for some kinds of device.

* Remove some dynamic configuration code from locked kernel memory.

* Allow privileged services to be provided through library
mechanisms instead of daemons, where it makes more sense to do

* Sometimes it is not possible to use a daemon: Allow programs
that currently have to run setuid root because their libraries
need special privileges (e.g., libvga and libkb, XF86DGA, maybe
future things like libutmp) to be run as ordinary users. There
are other examples. Programs that need limited guarantees on
real time performance, or limited page locking capabilities
spring to mind. There is no need to give these programs full
root privileges.

Of course this is rather non-Unix, avoiding devices and daemons. But
Linux is more modern than Unix. Modern interpreted scripting
languages can call any old shared library these days, so calling
privileged code in this way is often as simple as using a device.
For those ioctl occasions, calling a library function is invariably
simpler. And the ability to call privileged services without being
setuid root and without a daemon is a definite bonus.

Actually some of this can be almost simulated, albeit a bit slowly
and imperfectly, in user space using `clone'. Start the program as
root. Clone a thread to be the privileged thread, and drop
privileges in the main thread. Run the two in lockstep. Every time
the main thread wants to call privileged code, it stores some
parameters somewhere and sends a message to the other thread. That
then does the operation, passes a reply message back, and waits for
another message. Then the main thread continues.

This is imperfect because the main thread can clobber the other
thread's code and data, and because you must still run the program
setuid root. `fork' and shared memory cannot be used for some
things, because sometimes (such as libvga or XF86DGA) the
unprivileges thread needs direct access to some device-mapped memory
that only the privileged thread set up. You can set everything up at
the start like libvga, but then you cannot change it. (For example
if you want to vary the size of mmap'd /dev/dsp, with an audio mixing
thread that locks some memory and runs realtime with a watchdog timer
to avoid unfair use of the CPU). Then again this might be solvable
with a `MAP_REVERSE' flag to `mmap', which maps another process'
memory from ours.

I know that GGI aims to solve these difficulties in the specific case
of video access. Of course you still need to be root if you want to
read the keyboard in raw mode or run a realtime sound mixing thread
with some memory locked.

BTW, I've come across these issues while developing a commercial game
using Linux. I have found no way to access all the features the game
needs without the program (or some support program) being setuid

5. Speaking of raw keyboard mode, how about a device /dev/kbdN or
somesuch, which returns raw scancodes if you can read it? The normal
terminal never gets put into raw mode, but it doesn't receive any
characters while /dev/kbdN is open. Then if a program using raw
keyboard mode crashes, the keyboard is fine and you don't have to
reset your computer or have a handy other computer on a nearby

-- Jamie Lokier