Re: ATA device reset, shoud I be concerned?

From: Tejun Heo
Date: Mon Jan 21 2008 - 19:31:50 EST


Hello,

Alan Cox wrote:
>> I still don't think it's worth the trouble. There's currently only one
>> reported device which forgets to raise IRQ on media error. The behavior
>
> Most people wouldn't realise what is going on.

Yeap, true but I don't think we have many timeouts due to media errors.
I've seen lots of SMART logs for drives which caused timeouts but
haven't seen any which logged related media errors.

>>> Old IDE says it works for PATA. For SATA I can see it might need more
>>> care and you might simply not be able to get the info.
>> Old IDE often locks up the machine hard after timeouts. I'm all for
>
> The code paths are racy - it didn't use to in 2.4 (except for the promise
> drain bug)

My jmicron locks up hard under certain conditions. I haven't
investigated it too deep but it looks like a hard lockup (controller
dying while holding PCI bus). NMI watchdog doesn't work afterwards.

>> gathering more info but benefit vs. risk equation just doesn't look good
>> here. Why take risk for a rare device which forgets to raise IRQ on
>> media error? If such behavior is wide spread among PATA drives && we
>> can verify that TF register access after timeout is safe for PATA
>> controllers, sure, but currently we aren't sure about either.
>
> We lose IRQs in lots of other cases. Promise PATA is particularly bad at
> forgetting to give us the completion interrupt.

In that case, completing commands after 30secs doesn't really help as
long as normal operation can be recovered afterward. The driver should
take measures against lost interrupts like polling for interrupts after
a while. Those are two different problems and require different almost
opposite solutions. Some controllers need registers polled once in a
while while others die when registers are read unexpectedly.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/