Re: Been getting many GPs in Linux 2.0.33 + sshd 1.2.21

Gabriel Paubert (paubert@iram.es)
Wed, 7 Jan 1998 19:41:03 +0100 (MET)


On Tue, 6 Jan 1998, Brad Allen wrote:

> Gabriel,

> Thank you so much. Because of you I'm narrowing my search. I can
> reproduce the General Protection now; it requires running sshd
> *without* the -d option; the -d option does not allow proper debugging
> nor does it exhibit the bug. (There are other reports of this as
> well.)

Ok, I think I have an explanation now, and the kernel is not to blame.
I forgot that this type of problems can also be due to signal handlers
messing with segment registers. The Intel guys who invented the
segmentation system were simply unsane. It leads to so many subtle problems
and potential race conditions.

The problem happens because a signal handler modified the processor
context it is given as argument in an uncontrolled way.

Note that I think that this is still a slight kernel bug, it should not
generate an oops when user mode passes a bad parameter. I have had a look
at arch/i386/kernel/signal.c and my interpretation is that the segment
checking code still leaves some holes.

However the kernel ends up killing the process with a SIGSEGV, which is
what it should do. There are basically two solutions to this problem IMHO:

1) make more thorough (and hence slower) checks. However current code is
secure and some checks are quite complex, they are better left to the
processor and handled as below.

2) modify the GPF handler code so that it knows that some of the instructions
in entry.S, namely the segment register pops and iret, actually load
user mode values and if they fault, the process should be forceably killed
with a SIGSEGV (and for once the SEGmentation Violation deserves its name).
And that this does not justify an oops!

> I'm sorry that I cannot find time right at this moment to find
> strace options to show address in sshd; any suggestions of where
> to proceed to track this bug's source would be appreciated in the
> meanwhile.

Here follow my rough interpretation of what happens:

[snipped]

[pid 4916] 10:37:18.154185 geteuid() = 1170
[pid 4916] 10:37:18.156219 _exit(0) = ?
[pid 4915] 10:37:18.157397 <... close resumed> ) = 0
[pid 4915] 10:37:18.158037 --- SIGCHLD (Child exited) ---
[pid 4915] 10:37:18.159680 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 4916
[pid 4915] 10:37:18.160820 open("/var/log/lastlog", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 4915] 10:37:18.165541 fork() = 4917
[pid 4915] 10:37:18.167891 close(10 <unfinished ...>
[pid 4917] 10:37:18.168607 getpid( <unfinished ...>
[pid 4915] 10:37:18.169262 <... close resumed> ) = 0
[pid 4917] 10:37:18.169912 <... getpid resumed> ) = 4917
[pid 4915] 10:37:18.171019 dup(6 <unfinished ...>
[pid 4917] 10:37:18.171854 close(5 <unfinished ...>
[pid 4915] 10:37:18.172521 <... dup resumed> ) = 8
[pid 4917] 10:37:18.173165 <... close resumed> ) = 0
[pid 4915] 10:37:18.174129 sigaction(SIGCHLD, {0x8051950, [], SA_INTERRUPT|0x415352}, <unfinished ...>
[pid 4917] 10:37:18.176762 setsid( <unfinished ...>
[pid 4915] 10:37:18.177597 <... sigaction resumed> {SIG_DFL}) = 0
[pid 4917] 10:37:18.178799 <... setsid resumed> ) = 4917
[pid 4915] 10:37:18.179923 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK <unfinished ...>
[pid 4917] 10:37:18.180885 close(6 <unfinished ...>
[pid 4915] 10:37:18.181705 <... fcntl resumed> ) = 0
[pid 4917] 10:37:18.182363 <... close resumed> ) = 0
[pid 4915] 10:37:18.183083 select(9, [7 8], [], NULL, NULL <unfinished ...>
...
[snipped (process 4917)]
...
[pid 4915] 10:37:21.050908 <... select resumed> ) = 1 (in [8])
[pid 4915] 10:37:21.051884 --- SIGCHLD (Child exited) ---
[pid 4915] 10:37:21.053327 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 4917
[pid 4915] 10:37:21.054733 sigaction(SIGCHLD, {0x8051950, [], SA_STACK|SA_RESTART|SA_INTERRUPT|SA_ONESHOT|0x7ffef94}, {0x8051950, [], 0}) = 0
[pid 4915] 10:37:21.056571 sigreturn() = ? (mask now [])
[pid 4915] 10:37:21.111877 +++ killed by SIGSEGV +++

Here the sigreturn passes a processor context with a thrashed ss. You can
see the layout of the parameters passed to a signal handler in
asm-i386/sigcontext.h, this is of course very architecture dependant.

10:37:21.112572 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
10:37:21.113282 --- SIGCHLD (Child exited) ---
10:37:21.114688 wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG, NULL) = 4915
10:37:21.116052 wait4(-1, 0xbffff334, WNOHANG, NULL) = -1 ECHILD (No child processes)
10:37:21.117298 sigaction(SIGCHLD, {0x804a670, [], SA_STACK|SA_RESTART|SA_INTERRUPT|SA_ONESHOT|0x7fff43c}, {0x804a670, [], SA_NOCLDSTOP|0x2a}) = 0
10:37:21.120530 sigreturn() = ? (mask now [])

This sigreturn did not thrash any segment OTOH.

10:37:21.123416 select(6, [5], NULL, NULL, NULL) = 1 (in [5])
10:38:45.092857 accept(5, {sin_family=AF_INET, sin_port=htons(1021), sin_addr=inet_addr("166.84.186.50")}, [16]) = 7
10:38:45.094952 fork() = 4953
[pid 4893] 10:38:45.097561 close(7 <unfinished ...>
[pid 4953] 10:38:45.098292 close(5 <unfinished ...>

Conclusion: a serious bug in sshd, and IMHO a minor one in the kernel.

Gabriel.