Re: SCSI Kernel Problem - BAD

Steve Thompson (stevet@bofh.1984.net)
Sun, 10 Mar 1996 11:10:21 -0500 (EST)


-----BEGIN PGP SIGNED MESSAGE-----

On Sat, 9 Mar 1996, Eric Youngdale wrote:

> >Agreed. I expect one of the reasons the abort/reset code has not been
> >completely debugged is that in normal operation it gets invoked so rarely. If
> >there is now a higher level bug leading to timeout problems, that would cause
> >the abort/reset code to be used much more than ever before. It's only recently
> >that I've come up with a way of reliably generating such problems.

Just to throw my two-cents in to the pot here, I have experienced similar
random crashes myself. (In fact, I'm on-site now restoring some poor slob's
home directory from tape since the fsck lost some stuff...)

This problem has been through the entire 1.3.x series, although I was
experiencing some crashes (I'm sure) due to older versions of the aic7xxx
driver. However, I have noted one factor that leads me to believe that
there may be a race condition somewhere. I have two news servers running at
two sites. One is a 486/66 32MB RAM, 2940W, Micropolis hdd (x2), ISA SMC Nic,
etc. while the other is identical save that it is a P-100 and gets hit
far-harder. The P-100 system has _yet_ to crash with timeout errors. It's
being fed on a T1, to boot (The other system is on a 128K ISDN line).

Recently, I got an active terminator for the 486 and it's been OK
for three days, but this does not necessarily explain some of the symptoms.
One odd fact is that I have seen the "SCSI command aborting due to timeout
for PID xxxxxx" where xxxxxx _is_ a six-digit number. I was under the
impression that PIDs were 15 bits.

These problems that I'm experiencing _could_ be related to termination, but
back in September, I was getting one-month uptimes with 1.3.9 when I was
relying on the termination on the Micropolis drives and the 2940W. It's odd
that the (three, actually) Pentium systems don't have the same problems
that the 486 ones do. And I am using the same compiler (gcc 2.6.2, 4.6.27+,
- -fno-strength-reduce) on all of them.

BTW, Micropolis drives suck if you bought them in the summer/fall.

I'll post more info if I have any more observations to share.

Steve Thompson, System Administrator & Professional Malcontent
- --------------------------------------------------------------
"You're face alabaster/no cracks in the plaster/image carefully
contrived." - Vital Sines

-----BEGIN PGP SIGNATURE-----
Version: 2.6.i

iQCVAgUBMUL+5a/aN9TAUvclAQGSCQP+LMaNoYq2ccABetNfJi9xFFlULw8cuPvE
cVtidAGza4ayfhj6LUytNOqETr0fcQCpI1KOwFulIR+oTNfc4t75jbe+RINa7Dm1
pHxE+QsmpiuSlUh/TUaPJVAG5k0Y6LcdZuNcnUTERr+GeF1qGgMoL7pg4ZU638kY
ahWIXfnUBjA=
=RSrx
-----END PGP SIGNATURE-----