Re: ext2 filesystem corruption?!?!??

Jon Lewis (jlewis@inorganic5.fdt.net)
Sat, 12 Apr 1997 21:05:47 -0400 (EDT)


On Fri, 4 Apr 1997, Jeff Garzik wrote:

> Here's my machine config on which I've been having the filesystem
> corruption problems.

I'm also seeing the ext2 corruption occasionally, primarily on just 2
systems. The first is our news server:

P90, 128MB RAM, 2 NCR 810 (using BSD ported driver, tagged queuing off,
no_atime in use, 5mhz sync), 3c509. It has:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: CONNER Model: CP30540 545MB3.5 Rev: AEB8
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: IBM Model: DPES-31080 !t Rev: S31R
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: IBM Model: DPES-31080 !t Rev: S31K
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: MICROP Model: 3243-19MZ Q4D Rev: HT02
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 03 Lun: 00
Vendor: DEC Model: DSP5300S Rev: 427L
Type: Direct-Access ANSI SCSI revision: 02

All the above are internal on granite digital cables (it's not a bad cable
problem :). It has 6 feeds and stays pretty busy when not filled up.

The other is our tape backup host, P100, 48mb RAM, 1 NCR 810 (BSD driver,
tagged queuing off), SMC Ultra.
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: DALS-3540 !s Rev: S60E
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: DORS-32160 !# Rev: WA3E
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: PIONEER Model: CD-ROM DRM-602X Rev: 2902
Type: CD-ROM ANSI SCSI revision: 02
[luns 1-5 on the Pioneer clipped]
Host: scsi0 Channel: 00 Id: 06 Lun: 00
Vendor: ARCHIVE Model: Python 28388-XXX Rev: 5.AC
Type: Sequential-Access ANSI SCSI revision: 02

All devices are internal except the CD.

I've seen ext2 corruption with a variety of 2.0.x kernels. The tape host
was running 2.0.29, and today when I got to work, I found a screen full
of:

EXT2-fs error (device 08:16): ext2_new_block: Free blocks count corrupted
for block group 48
EXT2-fs error (device 08:16): ext2_new_block: Free blocks count corrupted
for block group 48
EXT2-fs error (device 08:16): ext2_new_block: Free blocks count corrupted
for block group 48
EXT2-fs error (device 08:16): ext2_new_block: Free blocks count corrupted
for block group 48

I've upgraded it to 2.0.30 now with tagged queuing on and 10mhz sync, plus
Gerard's ext2 debugging patch...sort of hoping it will happen again. The
partition where the corruption was is /home (on the DORS), which is where
the tape scripts log stuff during network backups. That logging is about
all the disk activity this system sees at night.

------------------------------------------------------------------
Jon Lewis <jlewis@fdt.net> | Unsolicited commercial e-mail will
Network Administrator | be proof-read for $199/hr.
________Finger jlewis@inorganic5.fdt.net for PGP public key_______