Re: ATA device reset, shoud I be concerned?

From: Tejun Heo
Date: Mon Jan 21 2008 - 12:02:48 EST


Alan Cox wrote:
>> Can you elaborate a bit? I don't really think completing a command
>> after 30sec timeout contributes a lot to driver stability.
>
> Timeout, timeout, timeout, reset, timeout.. (repeat), failed I/O
>
> This gives the end user no information about the fault, nor does it let
> the upper layers of SCSI and above distinguish between a random passing
> sulk and media errors which need the disk replacing.

I still don't think it's worth the trouble. There's currently only one
reported device which forgets to raise IRQ on media error. The behavior
is out of spec and rare. I don't think it's a good idea to change EH
behavior for it.

>>> Should that not then be a per host flag ?
>> Yeah, that would be the best. The problem is that there are several
>> different kinds of timeouts and we don't know which controller locks up
>> after which timeout and investigating them is really difficult.
>
> PATA controllers don't lock up in that case so its quite easy. The one
> exception is if the device jams IORDY but in that case you are dead
> anyway the next I/O (except on a SIL680 which has a timer we could use).
>
> Old IDE says it works for PATA. For SATA I can see it might need more
> care and you might simply not be able to get the info.

Old IDE often locks up the machine hard after timeouts. I'm all for
gathering more info but benefit vs. risk equation just doesn't look good
here. Why take risk for a rare device which forgets to raise IRQ on
media error? If such behavior is wide spread among PATA drives && we
can verify that TF register access after timeout is safe for PATA
controllers, sure, but currently we aren't sure about either.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/