Re: sata_nv + ADMA + Samsung disk problem

From: Robert Hancock
Date: Wed Jan 02 2008 - 19:22:50 EST


Allen Martin wrote:
The software definitely provides that guarantee for all NCQ-capable controllers.


Well if that's not it, it must be some problem entering ADMA legacy
mode. Here's what the Windows driver does:


ADMACtrl.aGO = 0
ADMACtrl.aEIEN = 0
poll {
until ADMAStatus.aLGCY = 1 || timeout
}

What we're doing to enter legacy mode is essentially:

-wait until ADMA status indicates IDLE bit set (max wait of 1 microsecond)
-clear GO bit in control register
-wait until status indicates LEGACY bit set (max wait of 1 microsecond)

and to enter ADMA mode:

-set GO bit in control register
-wait until status indicates LEGACY bit cleared and IDLE bit set (max wait of 1 microsecond)

The 1 microsecond timeout is pretty aggressive admittedly, but it apparently isn't being broken (the only timeouts when switching modes I've seen are during error handling after a command timeout has already occurred). What timeout value is the Windows driver using?

Also, I see you are clearing the AEIN bit when in register mode, while we're not. Is that important/necessary?

Aside from all this though, in the case of NCQ writes followed by a cache flush, that sequence of commands won't put us into legacy mode at all since the cache flush is a no-data command which we should be able to handle in ADMA mode, from my understanding (correct me if I'm wrong). So I don't imagine legacy/ADMA mode switch could be the cause of this problem.

I also saw in my previous investigation that a flush immediately followed by a write could cause the write to time out as well.

From some of the traces I took previously (posted on LKML as "sata_nv ADMA controller lockup investigation" way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20 microseconds, or potentially longer) after the previous command's completion, the ADMA status changes from 0x500 (STOPPED and IDLE) to 0x400 (just IDLE) as it typically does, but then it sticks there, no interrupt is ever raised, and CPB response flags remain at 0.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@xxxxxxxxxxxxx
Home Page: http://www.roberthancock.com/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/