Re: What to do (fwd)

D. Chiodo (djc@hal.microwave.com)
Mon, 9 Sep 1996 13:09:41 -0400 (EDT)


On Sat, 7 Sep 1996, D. Chiodo wrote:

> Date: Sat, 7 Sep 1996 21:52:50 -0400 (EDT)
> From: "D. Chiodo" <djc@hal.microwave.com>
> To: Linus Torvalds <torvalds@cs.Helsinki.FI>
> Cc: linux-kernel@vger.rutgers.edu
> Subject: Re: What to do (fwd)
>
> On Sat, 7 Sep 1996, Linus Torvalds wrote:
>
> > Date: Sat, 7 Sep 1996 23:31:57 +0300 (EET DST)
> > From: Linus Torvalds <torvalds@cs.Helsinki.FI>
> > To: "D. Chiodo" <djc@hal.microwave.com>
> > Cc: linux-kernel@vger.rutgers.edu
> > Subject: Re: What to do (fwd)
> >
> >
> >
> > On Thu, 5 Sep 1996, D. Chiodo wrote:
> > >
> > > I had posted this previously, but no one had any comments/advice/fixes.. I
> > > suspect it was becuase I buried the data at the bottom..
> > >
> > > Trace: 120a97 <close_fp+33/84>
> > > Trace: 1150e6 <do_exit+116/1f0>
> > > Trace: 2855e40
> > > Trace: 1151c0 <sys_exit>
> > > Trace: 1151ce <sys_exit+e/10>
> > > Trace: 283f41b
> > > Trace: 2855e40
> > > Trace: 10fae8 <do_page_fault>
> > > Trace: 10a63d <error_code+3d/50>
> > > Trace: 10a429 <lcall7+49/50>
> >
> > Looks like you're using the iBCS2 module.. Are you sure the module
> > matches the kernel? If you have kerneld, make sure it isn't loading an
> > old iBCS2 module by mistake or something like that..
> >
> > Linus
>
> First, Thank you very much for your quick reply (Not to mention thanks for
> the alternative to using MickeySoft and Apple O/S's -I work for an ISP
> that uses Linux for its primary servers..)
>
> Yes. I grabbed the latest ibcs, and done a fresh compile.
>
> (And not using kerneld, I have a direct call to insmod in my rc files)
>
> Oops (not the kernel type tho)- Just went to check my version of ibcs, and
> it seems the "latest" version I grabbed was two years old.. I have one
> with:
>
> Thu Jun 6 17:21:00 GMT/BST 1996
>
> as the header in Changelog that I was using previously, that I still got
> the errors with.. I will recompile/reload that one just to be sure though
>
> (Cant test it until Monday at work - it only manifests itself when
> installing/running WP, and I am remote via 28K at the moment)
>
> What _is_ the latest version of iBCS? (And is there an "iBCS" and an
> "iBCS2", the latter of which is newer?)
>
>

Ok.. I reinstalled the Jun 6 iBCS, and I still get the same errors.

Here is a play by play of everything leading up to this:

I had a 1.3.100 system, with everything running fine, including WP/SCO.. I
had a 400MB root fs, and another 400MB drive on a dummy mount point with
stuff like usr/src and usr/local symlinked to subdirs from it..

The second drive crashed hard.. I got a 1Gig, and rebuilt a new system,
starting with Slackware 3.0 (Kernel 1.2.13), and upgrading to
2.0.10/12/etc now 18

I still had a copy of WP on the old drive, so I tried to use it from
there.. It runs, but when it starts I got an odd message about "Unable to
start a new process", and when it exits, it doesn't release the "licences"
(license server on same machine), and I get "Segmentation Fault"...

*** IMPORTANT *** ::: IF I mount the old root file system, and chroot to
its mount point, and run WP from there, (using TCP/IP to get to the local
X server) it runs _PERFECTLY!_ I don't get the message, it releases the
licenses, doesn't GP.. ** Same iBCS, same kernel ** .. This obviously
cannot be a permanent solution.. I need it to work right..

I beleive it cannot be the kernel that is causing this (although it may be
aggravating the symptoms)..

I have PORED over the old filesystem, comparing it with the new, looking
for anything that might be affecting this.. I have concentrated on the
libraries, but as far as I can tell nothing is missing from the new
system.. All library versions seem to be the same.. I've looked over /dev
for anything that might be missing.. I don't know what else to look at..

How can I find out what is causing this and how to fix it?

I ALSO tried reinstalling WP from scratch on the new system.. It
installed, but while it was installing, a billion GP's scrolled up the
console.. And not just from the WP programs, but common things like "ln"
and "rm"... However, when it was done, the resulting install wouldnt run
at all..

I am completely confused on this..

Kernel is 1.0.18.. libc 5.3.12 and 4.7.6
(Note: although the symlink is there from libc.so.5 -> libc.so.5.3.12,
ldconfig -v doesn't show it.. But the OLD system was like this too)

I don't know what other information would help..

Here is the output from the insmod of iBCS..

hal:/lib/modules/2.0.18/misc# insmod -v iBCS
Section 1: (.text) at 0x400b9010
Section 2: (.rel.text) at 0x8054638
Section 3: (.rodata) at 0x400cb634
Section 4: (.rel.rodata) at 0x8059260
Section 5: (.data) at 0x400cf358
Section 6: (.rel.data) at 0x8059328
Section 7: (.bss) at 0x400d29c8
Section 9: (.comment) at 0x400d26f8
Section 10: (.shstrtab) at 0x805c898
Section 11: (.symtab) at 0x805c900
Section 12: (.strtab) at 0x805f5c0
textseg = 0x400b9008
bss_size = 24
last byte = 0x400d29e0
module size = 104920
versioned kernel: yes
versioned module: yes
ELF kernel
ELF module

Here is the GP from when xwp exits:

general protection: 0000
CPU: 0
EIP: 0010:[<0012c808>]
EFLAGS: 00010286
eax: f000ef6f ebx: 01216414 ecx: 00000000 edx: 0024fb4c
esi: 00000000 edi: f000ef6f ebp: 0119a810 esp: 0122cf10
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process xwp (pid: 183, process nr: 51, stackpage=0122c000)
Stack: 00000025 0012094b 01216414 00000000 00000025 00000005 00000001 00115082
00000000 00000001 00860001 02855e18 0011515c 0011516a 00000000 0283f423
00000000 08058560 0086549c bfffead8 bfffeae4 00214000 0196a000 0196a065
Call Trace: [<0012094b>] [<00115082>] [<02855e18>] [<0011515c>]
[<0011516a>] [<0283f423>] [<02855e18>]
[<0010fb20>] [<0010a62d>] [<0010a419>]
Code: 8b 50 48 85 d2 74 22 f6 42 1c 01 74 0f 53 83 c0 48 50 e8 15

And the ksymoops translation:

>>EIP: 12c808 <locks_remove_locks+c/38>
Trace: 12094b <close_fp+37/5c>
Trace: 115082 <do_exit+112/1ec>
Trace: 2855e18
Trace: 11515c <sys_exit>
Trace: 11516a <sys_exit+e/10>
Trace: 283f423
Trace: 2855e18
Trace: 10fb20 <do_page_fault>
Trace: 10a62d <error_code+3d/50>
Trace: 10a419 <lcall7+49/50>

Code: 12c808 <locks_remove_locks+c/38> movl 0x48(%eax),%edx
Code: 12c80b <locks_remove_locks+f/38> testl %edx,%edx
Code: 12c80d <locks_remove_locks+11/38> je 12c831 <locks_remove_locks+35/38>
Code: 12c80f <locks_remove_locks+13/38> testb $0x1,0x1c(%edx)
Code: 12c813 <locks_remove_locks+17/38> je 12c824 <locks_remove_locks+28/38>
Code: 12c815 <locks_remove_locks+19/38> pushl %ebx
Code: 12c816 <locks_remove_locks+1a/38> addl $0x48,%eax
Code: 12c819 <locks_remove_locks+1d/38> pushl %eax
Code: 12c81a <locks_remove_locks+1e/38> call 9090002c <_EIP+9090002c>
Code: 12c81f <locks_remove_locks+23/38> nop