Re: What I suspect

Linus Torvalds (torvalds@transmeta.com)
Tue, 7 Dec 1999 21:37:01 -0800 (PST)


On 7 Dec 1999, Ulrich Drepper wrote:
>
> What I suspect why you are pushing this technically inferior solution
> is that you want to support just another way of making syscalls which,
> e.g., your own CPU will be using. If this is true your are so
> shamelessly exploiting your position that I really cannot believe it
> is true.

It's not true, and quite frankly I don't understand what your problem is.

The idea of recompiling glibc is so far out that I cannot imagine how you
could seriously really believe in it yourself. Do you want somebody like
redhat to have a glibc compile as part of the installation?

Yes, some UNIX vendors did things like that (almost all of them relinked
the kernel). They are all dead or dying now, because it is not really a
very acceptable way of doing things. People want to have easy binary
distributions, not yet another version of the fundamental libraries.

Have you ever moved a harddisk from one system to another?

Your suggestion would make that not work any more, if the original system
was installed with "sysenter".

There are other alternatives, no question about that.

Another alternative is to do runtime dynamic linking. That works, but it
slows down the already slow process startup even more. Try running lmbench
with shared libraries vs static libraries, and cry your eyes out. Our
startup latency is already MUCH too high - not because of kernel issues,
but because of things like lazy binding.

Yet another alternative is to have a "syscall only" library, and have the
early bootup sequence set up the proper symbolic links or similar. Again,
that slows down process startup and implies another mmap of another file.
But it could be made less intrusive by having it mapped at a fixed address
etc.

In contrast, my suggestion just works. Sure, you need to fix up the errno
issue, and yes I agree that that _is_ an issue. But you didn't bother to
even consider just working on the thing: it's actually not at all
impossible to handle the errno case too.

For example, this is one rather trivial way of handling error codes. As an
example, I'll just code up "mmap()" in libc:

pushl $error_return
movl $__NR_mmap,%eax
call 0xfffff000 /* or where-ever the magic address is */
ret

error_return:
movl %eax,%edx
negl %edx
movl %edx,__thread_errno
movl $-1,%eax
addl $4,%esp
ret

and the magic address rules would be something like

- if %eax >= 0 on return do a dummy "pop" before the return
- otherwise just "ret" (which will take us to the error return, which has
to fix up the stack and "errno")

See? No recompile necessary, and you don't have to worry about what system
call to use because you're essentially telling the kernel to use whatever
it wants to.

Note that the _real_ reason I originally wanted this is because I hate
having the sigreturn trampoline on the stack. It's ugly, and is
conceptually really wrong - the stack shouldn't have to be executable. The
kernel should just write the sigreturn trampoline once, and not mess up
the D$/I$ like it does now.

The "signal handler invocation" part of lmbench is one of the few
benchmarks we lose (apart from the "process execution") to some of the
other unixes, and I suspect this is part of it (the loop won't be in the
I$ because Intel has a exclusion policy on D$ and I$ in order to avoid
aliases).

The signal handler issue shows up on other architectures than just x86. On
the alpha, for example, we have to flush the whole I$ because we're
generating dynamic code. Ugh. There I _know_ it's the reason lmbench
doesn't give us as good numbers as we should get.

So I've considered having a magic user-mapped page for quite some time,
because we have several of these cases where the kernel can generate code
at run-time to its own advantage. The sysenter thing is just an obvious
extension to this - allowing the kernel to generate whatever entry
sequence is the best for that particular CPU.

The same thing is also useful for generating efficient "memcpy()" etc -
without having to have different libraries for MMX etc. How were you
planning on handling that without making process startup slower?

Think of it as a "global library" as opposed to a "shared one". The global
part has many advantages: it will avoid TLB issues, because the magic
page(s) would always be marked global and thus you would never have to
invalidatethem on task-switching etc.

Lots of operating systems have global libraries (VMS comes to mind), but
they tend to be a lot too intrusive in my opinion. But it _does_ make a
lot of sense for fundamental operations like memcpy, system calls etc.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/