ext2 problems on fast-wide SCSI-2

Mike Wangsmo (root@baroque.co.uk)
Fri, 03 May 1996 11:10:59 -0400


I'm sending this to both linux-kernel and linux-scsi, as it would seem
both are potentially appropriate.

I've been trying to bring up a new, fairly high-performance machine
for use as a web/mail/whatever server.

It's a P150 (Tyan Tempest II) with 64MB RAM, a DPT PM2124W Fast/Wide
SCSI-2 controller, a Fujitsu M2934QA 4GB F/W SCSI-2 drive that's been
partitioned to have sda1 as a 300-odd MB root partition and sda4 as a
32MB swap partition, and a 3c595 Ethercard.

All the termination on the SCSI BUS seems to be in order, and the
cable length seems OK, with appropriate spacing in between
connections.

I have had, unfortunately, nothing but problems, running a variety of
kernels from 1.3.74 on. I initially chalked them up to the known
fragility of really-fast SCSI systems, but I've been trying to run
1.3.97, which is supposed to be much more reliable, and still have
major problems.

I messages I see often looks like (reformatted):

May 3 10:29:06 vineland kernel: free_one_pmd: bad directory entry
00400000

sometimes in clusters of as many as 14 (though the directory entry
number varies, it's always 00200000, 00400000 or 00800000).

egrepping through the sources, I get the impression that this means
things aren't going as expected in swap space, but I don't pretend to
be a kernel hacker, so that is as sophisticated as my analysis gets.

The message that I saw that got me to write this email (reformatted):

EXT2-fs error (device 08:01): ext2_find_entry: bad entry in directory
#101620: inode out of bounds - offset=12 inode=4194306, rec_len=12,
name_len=2

It was repeated 7 times in quick succession. All I was doing was
logging in, so it wouldn't seem to be load related. It didn't hang
the machine.

I've also had lockups and panics and such, but, as I said, I was
running kernels that were more-or-less known to have problems with
fast SCSI subsystems (or so I had gathered from these mailing lists),
so I figured I'd not clutter things up and just wait for the probems,
which seemed fairly well understood, to be resolved.

Now, of course, that it seems that these aren't the same problems as
before, I will endeavour to report them.

I'm open to any suggestions---I kind of went out on a limb suggesting
a Linux box rather than a SS20, and I must admit I'm feeling sort of
foolish right now. Fortunately, I'm not under terrifically heavy
deadline pressure to get this deployed.

Mike.

--
"Don't let me make you unhappy by failing to be contrary enough...."