Re: ext2 filesystem corruption?!?!??

Leonard N. Zubkoff (lnz@dandelion.com)
Thu, 10 Apr 1997 09:26:41 -0700


Date: Thu, 10 Apr 1997 00:19:31 -0700 (PDT)
From: Dan Hollis <goemon@sasami.anime.net>

I am working hard with Gerard to make the NCR driver robust. Part of the
problem seems to be that Linux does not have a good mid-level SCSI
error handling mechanism - all the error handling seems to have to be done
at a low level directly in the scsi driver.

I'm not certain of the entire set of comments Ted is referring to, but error
handling was only peripherally related to my comments on the NCR support. What
I said on this topic was that I had pushed strongly for the inclusion of
Gerard's BSD-ported driver in the standard kernel, since neither it nor the
standard driver completely dominated the other. There were reports that the
standard driver worked reliably for some people and the BSD-ported one did not,
and reports of just the opposite from other people. Because the NCR hardware
made greater demands on the PCI bus (fetching instructions from its script in
addition to performing data transfer), the NCR support was much more sensitive
to the quality of the motherboard's PCI implementation than is the case for
other PCI host adapters, and one or the other driver would work better in some
motherboards. That argued strongly that we should provide both drivers as
standard options in the kernel. Recall that before 2.0.3, the BSD-ported
driver was only available as a separate release that people had to manually
install in their kernels. Gerard has put in a great deal of excellent work on
the BSD-ported driver and it wouldn't surprise me at all if one day it should
become the default or only standard NCR driver.

As for error handling support from the SCSI mid-layer, it's not that the
support isn't there at all but that the interface is not well designed in my
opinion. The error handling is largely controlled from the mid-level, and the
basic ideas are sound, but the interface requires too much work from the driver
to protect itself against reentrancy. Eric Youngdale and I spent time at Linux
Expo sharing our ideas on improving this, and are continuing to do so.

I also recall discussing the destructive testing process I use with the
BusLogic driver to make sure that it is capable of recovering from the timeout,
abort, reset cycle that happens when there are SCSI bus problems.

Leonard