Re: [(patch?)] W95 can handle SCSI errors - Linux can't

Andries.Brouwer@cwi.nl
Tue, 25 Aug 1998 14:52:07 +0200 (MET DST)


From eric@andante.jic.com Tue Aug 25 05:52:02 1998

I would want to see the exact sequence you are seeing

Dear Eric,

I have no easy access to this 486 with Adaptec 1542 anymore;
the following report is from memory, and might not be 100% accurate.

Put a bad CD in the reader and mount it.
Sometimes the mount will fail with "No Medium" but usually it succeeds,
and read errors will occur while reading a file.
This is a home-burnt CD that is read correctly on some 4x drives,
but cannot be read on this 24x drive.
The result is that it never returns from a read command,
so that the scsi code times out.

scsi_block_when_processing_errors: Open returning 1
scsi_unjam_host: request sense?
Command to ID 3 timed out
Total of 0+1 commands on 1 devices require eh work
scsi_unjam_host: abort? - aha: Unable to abort
scsi_unjam_host: bus device reset? - aha: Trying device reset
test_unit_ready is called here, which asks for an abort again
aha: Unable to abort
scsi_eh_times_out is called
test_unit_ready returns FAILED

[So: the present situation is that the bus is fine and the
device is fine, but it is not ready because it has not finished
reading this sector from the bad CD.
Apparently the bus device reset does not change this.]

scsi_restart_operations: Waking up host to restart
Calling request function to restart things... (4x)
Error handler sleeping
thread 0 0 (77x)
In eh_done - result 0

So far one possible sequence of events.

Clearly, the aha abort code does not work - it is empty.
One might suspect that also the aha device reset code doesnt work,
i.e., that aha1542_dev_reset() does not do anything.
There is also the interesting comment there
"Leonard says we are doing this wrong ...".

Thus, it is possible that scsi_error.c is in principle correct,
and all bugs I observed are in aha1542.c.
(Maybe scsi_error should know that for example there is no abort
function. It might base its actions on such knowledge.
I mean that the pointer to the dummy aha1542_abort
should be replaced by NULL.)

On the other hand, if aha1542_dev_reset actually works,
and does do a device reset, then these observations show
that a modern well-functioning SCSI device need not
become ready after a device reset, and scsi_error.c
should take this possibility into account.

This scsi bus also has a tape unit.
After a reset of the bus, the tape unit makes noises for
several seconds - it may well remain busy for for example
10 seconds. This is also how I know that W95 does not do
a bus reset - there are no such noises when W95 gets an
I/O error on the CD.

If you want me to do some particular tests, that is probably
possible.

Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html