HELP! Bug in aic7xxx + scsi?

Steve Sparks (sparks@socketware.com)
Tue, 16 Nov 1999 12:03:59 -0500


I've got a problem I'm having a hard time debugging, but here's what I have
so far (these are two bugs, one should be easy to find&fix)

1) I have an AIC-7895. When I set the kernel up to compile in the aic7xxx
driver, it loads it in the kernel but then it also loads the module -
redetecting the two channels, for a total of four hosts, causing an
interrupt conflict, and panicking. No biggie, I just load aic7xxx by module
only, but you might want to fix it at leisure.

2) *THIS IS THE KILLER PROBLEM*

I have a Quantum Viking II 8.7gb drive. I was running RH5.2 (lk 2.0.36) on
it for ~8 months with no problems. Yesterday I tried to upgrade to RH6.0 (lk
2.2.5-15) and immediately began having errors.

The condition was any bulk read or write, ie. copying a 150MB file around or
tarring up a big tree or running a tape backup. The message was something
like

Kernel panic: scsi_free: attempting to free unused memory

I don't recall exactly. There were other errors - some bus timeouts - but
most were the above. The machine would not boot after that - when it detects
the filesystems, it would e2fsck since they weren't cleanly unmounted; the
act of e2fsck would cause the error again. I ran out and bought a WD IDE
drive; I got RH60 installed, but the partition table on the quantum drive is
so hosed that fdisk prints an empty table (but, surprisingly, properly
detects the CHS of the drive.) If anyone has a utility that will allow me to
at least detect the ext2 partitions I can get my .config and
/var/log/messages to help debugging.

I used to have Quantum Atlas SCSI hot-swappables in a Dell Poweredge 2300
running 2.2.5-15, and it crashed all the time too; our RAID card vendor said
something about Quantum's termination being non-standard, but I don't know
if that was tech support BS or for real. After having the viking blow up
with 2.2.5, I think it was for real.

We replaced the hot-swappables with IBM drives and the condition went away.

In any event, it worked with 2.0.36 scsi; it fails with 2.2.5 scsi. On
different hardware, different motherboards, the drive manufacturer is the
only obvious similarity except that both machines are SMP, one is a dual
P2/400 the poweredge is a dual P2/450. In both cases removing the quantum
drive stopped the problem.

If anyone knows how to get my logs/configs off the quantum drive, i'd be
super-willing to do that - I need all the freaking files back off it too!
(my tape backup gave me ~75% of the machine back)

any, all, and as much help as I could get would be a source of eternal
gratitude.

-S

--
Steven Sparks                     Socketware, Inc. (http://www.accucast.com)
Guru                              1776 Peachtree St. NW, Suite 500 South
sparks@socketware.com             Atlanta, GA 30308
(404)815-1998 x15                 1-877-4-ACCUCAST (422-2822)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/