Re: [PATCH 3/5] libata: Implement disk shock protection support

From: Robert Hancock
Date: Tue Aug 05 2008 - 00:18:42 EST


Tejun Heo wrote:
Elias Oltmanns wrote:
On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the speified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). This patch simply stops processing the
request queue. In particular, it does not yet, for instance, defer an
SRST issued in order to recover from an error on the other device on the
interface.

For libata, the easiest way to achieve the above would be adding a
per-dev EH action, say, ATA_EH_UNLOAD and schedule EH w/ the action OR'd
to eh_info->action. The EH_UNLOAD handler can then issue the command
wait for the specified number of seconds and continue. This will be
pretty simple to implement as command exclusion and stuff are all
automatically handled by EH framework.

However, SATA or not, there simply isn't a way to abort commands in ATA.
Issuing random command while other commands are in progress simply is
state machine violation and there will be many interesting results
including complete system lockup (ATA controller dying while holding the
PCI bus). The only reliable way to abort in-flight commands are by
issuing hardreset. However, ATA reset protocol is not designed for
quick recovery. The machine is gonna hit the ground hard way before the
reset protocol is complete.

How long does hardreset have to take? I only see a 1ms delay in the COMRESET process (sata_link_hardreset). I'd think it would be feasible to do something like:

-stop the queue to prevent new commands from being issued
-wait a certain amount of time (20ms or so?) for existing command(s) to complete, if they do then issue the idle command
-if time runs out, trigger a hardreset and then issue the idle command

The drive is going to take a little while to actually unload the heads anyway, so a few milliseconds delay doesn't seem like a big deal..


The only way to solve this nicely is either to build the accelerometer
into the drive and let the drive itself protect itself or implement a
sideband signal to tell it to duck for cover. For SATA, this sideband
signal can be another OOB sequence. If it's ever implemented this way,
it will be in SControl, I guess.

Well, short of that, all we can do is to wait for the currently
in-flight commands to drain and hope that it happens before the machine
hits the ground. Also, that the harddrive is not going through one of
the longish EH recovery sequences when it starts to fall. :-(

Well, Lenovo (and others?) have implemented this in Windows somehow.. It would be interesting to know what solution they used there (either hardreset, issue the command even when busy, or just wait for the commands to hopefully finish in time).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/