Re: [patch] fastcall-2.3.32-B6, SYSENTER/SYSEXIT support

Richard Gooch (rgooch@ras.ucalgary.ca)
Sun, 12 Dec 1999 00:04:41 -0700


Linus Torvalds writes:
> On Fri, 10 Dec 1999, Pavel Machek wrote:
> >
> > That is bad idea. You want (to be able to) compile 386 kernel with
> > fastcall ability when ran on PII. (That's the one RedHat is going to
> > ship, you know?)
>
> Well, more importantly, I _still_ think that if we add a new way of doing
> system calls, we ALSO have to add a new way to have a unified calling
> convention.
>
> We should NOT say "you can now do system calls fifteen different ways,
> please try to find out which one is best for you".
>
> We SHOULD say "ok, we have ONE new way of making system calls, and the
> kernel will just use the fastest way that works on this CPU".
>
> This is not only an issue of good taste and sane interfaces. It is also an
> issue of long-term maintenance. Anybody can add new features - just look
> at what happened to DOS->Win1->Win2->Win3->Win31->Win95->Win98->...
>
> We want to AVOID that kind of chaos - we do not want to have different
> ways of doing the same things that offer slight advantages over each over.
[...]
> For example, all this discussion about which system calls can be
> done with SYSENTER, and which ones cannot is just looking at things
> the wrong way, and should have convinced people that it is not a
> choice we should even make user space AWARE of. Because the issue
> does not make any sense on a user space level - the whole approach
> is wrong if this becomes a ABI issue.

I agree that we want a simple and scalable ABI.

> Instead, we should just have different classes of system calls:
[...]

Maybe this is trying to abstract it too much? Putting different
classes of system calls into the ABI seems awkward to me.

I propose a much simpler abstraction: set up a global page (which
always appears at a fixed address in user-space), and set up a jump
table. Have one jump vector per system call. That's the ABI. End of
story.

On a dumb CPU, each vector points to a piece of code that implements
the standard syscall. On better CPUs, some syscall vectors will point
to code that uses a better syscall interface (like sysenter). And some
syscall vectors (i.e. gettimeofday(2)) will point to code that reads
some kernel data in the global page, without any switch to kernel
space.

Now we can optimise syscalls on a case-by-case basis, rather than
trying to solve all the problems.

If you are concerned about multiple jumps, maybe each entry in the
table can implement the standard syscall interface, as long as they
don't take too many instructions.

All user-space has to know is that syscall N is made by making a
function call to BASE+k*N, where k is the size of the jump vectors.
And whether parameters are passed in registers or on the stack.
BASE and k are fixed for all time.

> If people don't like the page mapping idea, then come up with a
> better way, but don't beat the dead horse of exposing SYSENTER.

I'm happy with the page mapping idea, but what concerns me is that we
can end up with a kernel which has a fair bit of code data embedded in
it, due to the increasing number of syscall instructions. Even if it's
contained in __init sections, it still bloats the kernel image. This
is a particular problem with embedded systems. Config options will
help here, but we have too many of those already.

So I suggest a few possibilities for working around this:

1) the kernel reserves a page (or pages) which is mapped into each
process at the same VA, but is written to by user-space. Perhaps a
syscall/ioctl to write-protect the region once initialised

2) have a single module which contains all the code data variants and
writes the appropriate selection to the global page(s)

3) have a collection of modules, one for each CPU (implementation)
type, and user-space picks the correct one to load. The module
initialises the global page.

The advantage of (1) is that it's simple (minimum kernel bloat). The
disadvantages are that it separates code from data (the
gettimeofday(2) case), which Linus doesn't like, and that it requires
some user-space code to use the standard syscall interface (since the
global page won't be initialised yet).

The advantage of (2) is that it keeps the kernel image small. The
disadvantages are that it doesn't separate code from data, it's still
a big module to carry around for embedded systems, and also some
user-space code has to use the standard syscall interface.

The advantage of (3) is that keeps the kernel small and is also
friendly to embedded systems. It also suffers from the code/data
separation and some user-space code having to use the standard syscall
interface.

Regards,

Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/