BUG or hardware failure?

From: Barrett G. Lyon (blyon@theshell.com)
Date: Thu Jul 13 2000 - 18:06:40 EST


BUG or hardware failure?

Over the last two weeks a very overworked Alpha running 2.2.16 has been
experiencing some SCSI issues. I can't tell if these crashes are hardware
or software related. If it is hardware related is it a drive going bad or
is it a controller card, cable, or some other issue? The machine runs
fine for a few days and then it occasionally displays odd messages about
inodes still having aliases (seen in table 1). Later after about 4 days
of uptime it gets stuck is a loop message (seen in table 2). It happens
about once a week now. After the machine is rebooted and fsck's it will
run fine for another week or so.

Some insight to this would be helpful, for if we have bad hardware I would
like to replace it, but if it's a kernel issue we would like to supply as
much information possible to fix it.

Review table 3 for hardware details. Thank you - Barrett

<table 1>
Jul 8 12:17:51 63.236.138.5 kernel: iput: device 08:02 inode 2233042 still has aliases!
Jul 8 12:20:01 63.236.138.5 kernel: iput: device 08:11 inode 3162132 still has aliases!
Jul 8 14:17:01 63.236.138.5 kernel: iput: device 08:02 inode 2233042 still has aliases!
Jul 8 17:05:11 63.236.138.5 kernel: iput: device 08:02 inode 987295 still has aliases!
Jul 8 17:05:12 63.236.138.5 kernel: iput: device 08:02 inode 2233042 still has aliases!
Jul 8 17:06:45 63.236.138.5 kernel: iput: device 08:02 inode 2233042 still has aliases!
Jul 8 17:09:57 63.236.138.5 kernel: iput: device 08:02 inode 2233042 still has aliases!
Jul 8 17:09:59 63.236.138.5 kernel: iput: device 08:02 inode 987295 still has aliases!
Jul 8 17:19:59 63.236.138.5 kernel: iput: device 08:02 inode 987295 still has aliases!
Jul 8 17:20:00 63.236.138.5 kernel: iput: device 08:02 inode 2233042 still has aliases!
Jul 8 18:18:51 63.236.138.5 kernel: iput: device 08:11 inode 4286509 still has aliases!
Jul 8 18:18:51 63.236.138.5 kernel: iput: device 08:11 inode 2848872 still has aliases!
Jul 8 18:18:52 63.236.138.5 kernel: iput: device 08:11 inode 884758 still has aliases!
Jul 8 18:18:52 63.236.138.5 kernel: iput: device 08:11 inode 3289121 still has aliases!
Jul 8 18:18:52 63.236.138.5 kernel: iput: device 08:11 inode 809029 still has aliases!
</table 1>

<table 2>
Jul 13 12:53:14 63.236.138.5 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002
Jul 13 12:53:14 63.236.138.5 kernel: [valid=0] Info fld=0x0, Current sd08:02: sense key Not Ready
Jul 13 12:53:14 63.236.138.5 kernel: Additional sense indicates Logical unit not ready, initializing command required
Jul 13 12:53:14 63.236.138.5 kernel: scsidisk I/O error: dev 08:02, sector 197742
Jul 13 12:53:14 63.236.138.5 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002
Jul 13 12:53:14 63.236.138.5 kernel: [valid=0] Info fld=0x0, Current sd08:02: sense key Not Ready
Jul 13 12:53:14 63.236.138.5 kernel: Additional sense indicates Logical unit not ready, initializing command required
Jul 13 12:53:14 63.236.138.5 kernel: scsidisk I/O error: dev 08:02, sector 197754

<the machine crashes at this point>

Jul 13 12:59:59 63.236.138.5 kernel: bash: Exception at [setup_sigcontext+616/736] (fffffc000031740c) handled successfully.
Jul 13 12:59:59 63.236.138.5 last message repeated 4 times
Jul 13 12:59:59 63.236.138.5 kernel: SCSI disk error : host 0 channel 0 id
0 lun 0 return code = 28000002
</table 2>

<table 3>
Linux version 2.2.16 (root@arsenic.theshell.com) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 Thu Jun 8 12:20:4 3 PDT 2000
Booting on EB164 variation SX164 using machine vector SX164 from MILO
Command line: bootdevice=sda2 bootfile=boot/vmlinux.gz root=/dev/sda2
Console: colour VGA+ 80x25
Calibrating delay loop... 529.53 BogoMIPS
Memory: 516888k available
Dentry hash table entries: 65536 (order 7, 1024k)
Buffer cache hash table entries: 524288 (order 9, 4096k)
Page cache hash table entries: 65536 (order 6, 512k)
VFS: Diskquotas version dquot_6.4.0 initialized
POSIX conformance testing by UNIFIX
Alpha PCI BIOS32 revision 0.04
PCI: Probing PCI hardware
SMC37c669 Super I/O Controller found @ 0x3f0
Linux NET4.0 for Linux 2.2
Based upon Swansea University Computer Society NET3.039
NET4: Unix domain sockets 1.0 for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
TCP: Hash tables configured (ehash 524288 bhash 65536)
Starting kswapd v 1.5
pty: 1024 Unix98 ptys configured
Floppy drive(s): fd0 is 2.88M
FDC 0 is a post-1991 82077
sym53c8xx: at PCI bus 0, device 7, function 0
sym53c8xx: 53c895 detected with Symbios NVRAM
sym53c895-0: rev=0x01, base=0x9001000, io_port=0x8800, irq=26
sym53c895-0: Symbios format NVRAM, ID 7, Fast-40, Parity Checking
sym53c895-0: initial SCNTL3/DMODE/DCNTL/CTEST3/4/5 = (hex) 07/8e/a0/00/00/24
sym53c895-0: final SCNTL3/DMODE/DCNTL/CTEST3/4/5 = (hex) 07/4e/80/00/08/24
sym53c895-0: on-chip RAM at 0x9002000
sym53c895-0: resetting, command processing suspended for 2 seconds
sym53c895-0: restart (scsi reset).
sym53c895-0: enabling clock multiplier
sym53c895-0: Downloading SCSI SCRIPTS.
scsi0 : sym53c8xx - version 1.3g
scsi : 1 host.
sym53c895-0: command processing resumed
sym53c895-0-<0,*>: WIDE SCSI (16 bit) enabled.
sym53c895-0-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 15)
  Vendor: IBM Model: DRVS09V Rev: 0100
  Type: Direct-Access ANSI SCSI revision: 03
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
sym53c895-0-<1,*>: WIDE SCSI (16 bit) enabled.
sym53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 15)
  Vendor: IBM Model: DRVS18V Rev: 0140
  Type: Direct-Access ANSI SCSI revision: 03
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
  Vendor: IBM Model: DDRS-34560W Rev: S92A
  Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi0, channel 0, id 2, lun 0
sym53c895-0-<3,*>: WIDE SCSI (16 bit) enabled.
sym53c895-0-<3,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15)
  Vendor: HP Model: 4.26GB B 68-0854 Rev: 0854
  Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdd at scsi0, channel 0, id 3, lun 0
sym53c895-0-<0,0>: tagged command queue depth set to 8
sym53c895-0-<1,0>: tagged command queue depth set to 8
sym53c895-0-<2,0>: tagged command queue depth set to 8
sym53c895-0-<3,0>: tagged command queue depth set to 8
scsi : detected 4 SCSI disks total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17928698 [8754 MB] [8.8 GB]
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 35879135 [17519 MB] [17.5 GB]
sym53c895-0-<2,*>: WIDE SCSI (16 bit) enabled.
sym53c895-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15)
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 8925000 [4357 MB] [4.4 GB]
SCSI device sdd: hdwr sector= 512 bytes. Sectors= 8330543 [4067 MB] [4.1 GB]
tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov
eth0: MII transceiver #1 config 1000 status 782d advertising 01e1.
Partition check:
 sda: sda1 sda2
 sdb: sdb1
 sdc: sdc1 sdc2 sdc3 sdc4
 sdd: sdd1
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 80k freed
Adding Swap: 209800k swap-space (priority -1)
Adding Swap: 201056k swap-space (priority -2)
Adding Swap: 201056k swap-space (priority -3)
</table 3>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jul 15 2000 - 21:00:18 EST