2.0.35 hard-lockups

Steve Shah (sshah@cs.ucr.edu)
Tue, 6 Oct 1998 15:12:34 -0700


Hello Everyone,

I've got a 2.0.35 Redhat 4.2 installation that is primarily a
NFS/Samba server. It uses an NE2k NIC and an Adaptec 2940 SCSI card.
It has a Diamond Stealth video card of some kind but X-windows is
strictly forbidden on it. The machine has 128M of RAM and is a P-200.

We're having an ugly problem with it locking up every
few days. (anywhere from within 2 days of boot to a little less than
2 weeks) When it locks -- it locks hard. No messages sent to
syslog and nothing on console. A few times, it locked up on an Oops
generated when dump was running. If memory serves me right, all lock
ups happen at night (during backups). We use Amanda with dump to
backup the system -- this machine is a backup client.

The disk configuration is as follows:

2G /dev/hda
6G /dev/hdb
CDROM /dev/hdc
4G /dev/sda
4G /dev/sdb
4G /dev/sdc
2G /dev/sdd
2G /dev/sde

Each SCSI disk is one big partition. The IDE disks are a
couple of smaller partitions, the largest being 2G. /dev/sdc often
back activity 24 hours a day. The others are busy during the day, but are
quiet at night. Both IDE disks are busy at day, quiet at night.

Note worthy kernel config options:
CONFIG_BLK_DEV_TRITON=y
CONFIG_SYN_COOKIES=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_OVERRIDE_CMDS=y
CONFIG_AIC7XXX_CMDS_PER_LUN=24
CONFIG_AIC7XXX_PROC_STATS=y
CONFIG_AIC7XXX_RESET_DELAY=5

To summerize my question: HELP! I'm baffled by this behavior. I haven't
been able to get a good uptime on it in months. I thought 2.0.34/35 would
do it (it was running 2.0.33) but it hasn't. I know this hardware
is okay because last year I got over 1 month of uptime out of it. I
upgraded to 2.0.35 because of the lesser flakiness in the Adaptec
drivers.

My hunch is that I'm having a SCSI problem but without log entries
at crash time it's hard to tell. I occationally get SCSI bus resets
but I can go for another week before the system dies so I doubt there
is a correlation to that specifically. Has anyone else had this problem?
Something similar?

btw, I'm not subscribed to linux-kernel so please cc me in on the
responses.

Thanks,
-Steve

--
______________________________________________________________________________
Steve Shah (sshah@cs.ucr.edu) | SysAdmin/Coder/Gabbernaut/DJ/Writer/Minister
http://www.cs.ucr.edu/~sshah  | We're not dropping out, we're infiltrating.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Beating code into submission, one operating system at a time...

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/