Re: 2.6.29 regression: ATA bus errors on resume (output with debugpatch)

From: Tejun Heo
Date: Sun May 24 2009 - 20:32:53 EST


Hello,

Niel Lambrechts wrote:
> Bug triggered with your patch! I played audio while suspending to try
> and increase activity (I also removed a CD on boot), and the filesystem
> came up dirty! This was on attempt nr. 3 or 4.

Great.

Here's the problem.

May 23 12:15:11 linux-7vph kernel: XXX scsi_eh_flush_done_q: online=1(2) noretry=2 retries=0 allowed=5

scsi_noretry_cmd() is returning non-zero indicating that the request
shouldn't be retried and failed immediagely. Looks like the return
value 2 is from blk_failfast_dev() which tests REQ_FAILFAST_DEV. It's
most likely to be set in init_request_from_bio() while translating bio
flags.

cc'ing Theodore Tso. Hello, Neil is reporting ext4 checking out after
resuming.

http://thread.gmane.org/gmane.linux.kernel/814466/focus=817937

The origin of the problem is ATA device triggering a PHY event after
resume sequence is complete. I still don't know why this happens but
it does on certain machines. This in itself shouldn't be a big
problem as the device works fine after one more pass of ATA EH and the
in-flight requests would be retried. However, for some reason, the
aborted commands seem to have REQ_FAILFAST_DEV set thus failing
immediately which, in turn, triggers ext4 errors. Does anything ring
a bell?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/