Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device152d:2338

From: Robert Hancock
Date: Tue Jul 22 2008 - 01:49:12 EST


Tomas Styblo wrote:
* Robert Hancock <hancockr@xxxxxxx> [Mon, 21 Jul 2008]:
I'm not sure this is a good approach. More that this code right above in usb_stor_invoke_transport, which your code undoes the effect of for this device, doesn't seem right:

/* If things are really okay, then let's show that. Zero
* out the sense buffer so the higher layers won't realize
* we did an unsolicited auto-sense. */
if (result == USB_STOR_TRANSPORT_GOOD &&
/* Filemark 0, ignore EOM, ILI 0, no sense */
(srb->sense_buffer[2] & 0xaf) == 0 &&
/* No ASC or ASCQ */
srb->sense_buffer[12] == 0 &&
srb->sense_buffer[13] == 0) {
srb->result = SAM_STAT_GOOD;
srb->sense_buffer[0] = 0x0;
}


The patch doesn't exactly undo the effect of the code above,
because the value of _result_ is different. When this problem

Yes, you and Alan are right, that code only triggers on a good result.

happens, the condition above is false, _result_ is
USB_STOR_TRANSPORT_FAILED and scsi_get_resid(srb) > 0, but the
chipset doesn't report any error (NO_SENSE,ASC==0,ASCQ==0). That's
why I think there's something wrong with the chipset.

I'm assuming what triggers the transport failure is it getting a US_BULK_STAT_FAIL result from the transfer inside usb_stor_Bulk_transport. It could be there's some condition in the chip that causes that error, yet doesn't provide any sense data.

In any case, given that your code apparently fixes the corruption it seems that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the SCSI layer looks at the sense data and says "hmm, nothing to worry about here" and carries on.

I think we do need something like your patch, though it should likely be moved inside the if (need_auto_sense) check, and I don't see a reason to limit to this device ID only.


There are Windows users on various message boards who report the
same problem with this chipset - a kind of silent data corruption
that occurs only when copying large amounts of data.

But as I said I know little about SCSI and USB. I tried to locate
and fix the problem, but I can't tell whether the current error
handling code is written according to the relevant standards.

A more generic approach would certainly be better than hardcoded
device ids. Perhaps this check should be enabled for all devices?
Why not?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/