Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm

From: John Stultz
Date: Thu Jun 12 2014 - 19:51:12 EST


On Wed, Jun 11, 2014 at 10:35 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
> be surprising, as I saw problems with that patch earlier in the
> 3.15-rc cycle:
> https://lkml.org/lkml/2014/4/14/824
>
[...]
>
> Unfortunately reverting the change (manually, as it doesn't revert
> cleanly anymore) doesn't seem to completely avoid the issue, so the
> bisection may have gone slightly astray (though it is interesting it
> landed on the same commit I earlier had trouble with). So I'll
> back-track and double check some of the last few "good" results to
> validate I didn't just luck into 3 good boots accidentally. I'll also
> review my revert in case I missed something subtle in doing it
> manually.

So I'm getting some baffling results. I started going back over the
git bisect logs to see if I had mis-marked a revision as good due to
the issue just not reproducing.

However, despite many many reboots the last good commit in my branch
- bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: Fixup busy
detection while...) doesn't ever show the issue. While the immediately
following commit which bisect found -
e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq
before DATA irq) always does.

The immensely frustrating part is while backing that single change off
from its commit sha always makes the issue go away, reverting that
change from on top of v3.15 doesn't. The issue persists. Since it
doesn't revert cleanly, I also reverted a following patch that it
interacted with 8d94b54d99ea968a9d188ca0e68793ebed601220 (mmc: mmci:
Enable support for busy detection....) to make sure I didn't miss some
dependency and the issue *still* crops up. In fact, doing a git diff
bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3..v3.15 drivers/mmc/ doesn't
seem to resolve the issue.

So I'm really at a bit of a loss on what to do next. While it seems
that the "mmci: Handle CMD irq before DATA..." commit is problematic,
there also seems to be some other commit in v3.15 which results in the
same problematic behavior. I may try to bisect again between the
first bad commit and v3.15, reverting the bad commit each time to see
if I can chase it down, but if anyone has better debugging tools here,
I'd greatly appreciate it.

Again, I'm happy to help interested folks get this reproducing on
their own machine for debugging.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/