Re: 2.6.29 regression: ATA bus errors on resume (output with debugpatch)
From: Tejun Heo
Date: Sun May 24 2009 - 20:32:53 EST
Hello,
Niel Lambrechts wrote:
> Bug triggered with your patch! I played audio while suspending to try
> and increase activity (I also removed a CD on boot), and the filesystem
> came up dirty! This was on attempt nr. 3 or 4.
Great.
Here's the problem.
May 23 12:15:11 linux-7vph kernel: XXX scsi_eh_flush_done_q: online=1(2) noretry=2 retries=0 allowed=5
scsi_noretry_cmd() is returning non-zero indicating that the request
shouldn't be retried and failed immediagely. Looks like the return
value 2 is from blk_failfast_dev() which tests REQ_FAILFAST_DEV. It's
most likely to be set in init_request_from_bio() while translating bio
flags.
cc'ing Theodore Tso. Hello, Neil is reporting ext4 checking out after
resuming.
http://thread.gmane.org/gmane.linux.kernel/814466/focus=817937
The origin of the problem is ATA device triggering a PHY event after
resume sequence is complete. I still don't know why this happens but
it does on certain machines. This in itself shouldn't be a big
problem as the device works fine after one more pass of ATA EH and the
in-flight requests would be retried. However, for some reason, the
aborted commands seem to have REQ_FAILFAST_DEV set thus failing
immediately which, in turn, triggers ext4 errors. Does anything ring
a bell?
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/