Re: BAD CPU --> SIG11 (mine appears to be caused by cache)

wiegley@teamster.usc.edu
Sat, 25 May 1996 19:11:34 -0700 (PDT)


Ok, so I too am having the same probalem with an Iwill motherboard
(the P54TSW2 model with built-in AHA-2940UW) and P166 that I bought
last weekend. I have never had a singal 11 error before and before I
go shooting my mouth off to the dealer I did some testing as you all
suggested...

This board comes with a 512K pipelined burst cache soldered to the
board. With the external cache turned on in the BIOS I can run

#!/bin/sh
for i in 0 1 2 3 4 ; do
for j in 0 1 2 3 4 5 6 7 8 9 ; do
make clean; make zImage >& log.$i$j
done
done

and approx 1 out of every 4 'make zImage's fails with

gcc: Internal compiler error: program cc1 got fatal signal 11

and they fail in different spots of course, so I doubt its software
related. (With the external cache turned on I also seem to always have
corrupt filesystems when I reboot and get to the fsck checkpoint, if
that matters any).

With the external cache turned off I can do the same thing and it
builds all 50 kernels without a hitch and no signal 11 or any other
errors. Of course this takes significantly longer and is undesirable
because I don't have a cache.

Can I be pretty certain then at this point that this error I am
getting is due to a faulty cache on this motherboard and that it
should be exchanged? This motherboard is supposed to support 180Mhz
and 200Mhz CPUs as well so I would think that the cache should perform
flawlessly with a 166Mhz CPU. Am I wrong here?

Or am I missing something entirely? Could it actually be the CPU's
fault trying to access cache memory or what-not? I don't have another
P166 lying around to do testing with and I would like to be as firm as
possible with my dealer about getting a replacement on Tuesday. 'Cause
you all have probably been through the drill...

dlr: What problem are you having?
you: gcc fails because of signal 11 errors caused by faulty hardware.
dlr: gcc? signal 11?? never heard of these errors, what software are
you using?
you: linux, a superb and excellent operating system but that is besides
the poi...
dlr: oh, it must be problem with software. What errors is DOS or Windows
giving you?
you: no, its not the software and I dont use DOS...
dlr: you should use Window...
you: this is not about what O/S I use. with the cache turned on it
screws up with the cache disabled it works fine, hence it
must be a problem with the cache...
dlr: but if you dont use DOS how do you know you are getting errors?...
you: AAAAAAARRRRRRRRRRRGGGGGGGGHHHHHHHHHHH! blam blam blam blam blam
tv: earlier today a terrible shooting occurred in a neighborhood shop...

So would you, or would you not, say this pretty much nails it down as
an external cache/motherboard problem? (I can't replace just the
cache)

Thanks for your diagnosis and opinion on this matter, I appreciate it.

- Jeff Wiegley

On Fri, 24 May 1996, Darrin R. Smith wrote:

> Jan Kees Joosse wrote:
> >
> > In article <Pine.LNX.3.93.960512232530.407C-100000@jcoy-ppp.cscwc.pima.edu>,
> > "Jeff Coy Jr." <jcoy@jcoy-ppp.cscwc.pima.edu> writes:
> > >On Sun, 12 May 1996, lilo wrote:
> > >
> > >=> There are also known irregularities with L2 writeback cache and
> > >=> cleanroom-microcode AMD DX2/80's. I've never seen anything more than weird
> > >=> bogomips values though....
> > >=>
> > >i have a 486dx2-80, and the sig 11's only showed up for me when i really
> > >pushed the machine, like when i recompiled gcc. the first build would be
> > >fine, but the second build would crash about 45 minutes in. the crashes
> > >weren't really frequent, but setting the CPU speed back to 66mHz solved
> > >the problem and offered the fastest compile time.
> > >
> > >the only other solution was turning off the external cache, but without
> > >the extern cache, first stage gcc compile was 36 minutes at 80 mHz, but at
> > >66 mHz i get 26 minute first stage builds.
> >
> > Take a look at http://www.bitwizard.nl/sig11/ This page describes
> > causes for a signal 11 while compiling (esp. kernel-compiles).
> >
> > The effects of turning of the external cache also point to the direction
> > of memory subsystem problems. When you set the CPU speed to 80 MHz the
> > motherboard operates at 40Mhz, but when the CPU-clock is set to 66MHz the
> > board operates at 33MHz. Maybe slow memory? Check waitstages.
> >
> > Jan Kees
>
> Incedentally, one fix for the 80 Mhz AMD might have been to
> go to Radio Shack or the local elecronics store and buy a little
> tube of white silicon heatsink compound. I put a thin layer of this
> between my cpu/heatsink combination and I have never had a problem
> with my own AMD-80 chip. This will effectively increase the cooling
> ability of the heatsink and fan by providing better contact between
> the processor and heatsink. It's scary to think that most processors
> just have the heatsink clipped on with only bare contact to do the
> cooling.
>
> --Darrin
>