Re: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm

From: Ulf Hansson
Date: Fri Jun 13 2014 - 08:28:37 EST


On 13 June 2014 01:51, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> On Wed, Jun 11, 2014 at 10:35 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
>> Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
>> (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
>> be surprising, as I saw problems with that patch earlier in the
>> 3.15-rc cycle:
>> https://lkml.org/lkml/2014/4/14/824
>>
> [...]
>>
>> Unfortunately reverting the change (manually, as it doesn't revert
>> cleanly anymore) doesn't seem to completely avoid the issue, so the
>> bisection may have gone slightly astray (though it is interesting it
>> landed on the same commit I earlier had trouble with). So I'll
>> back-track and double check some of the last few "good" results to
>> validate I didn't just luck into 3 good boots accidentally. I'll also
>> review my revert in case I missed something subtle in doing it
>> manually.
>
> So I'm getting some baffling results. I started going back over the
> git bisect logs to see if I had mis-marked a revision as good due to
> the issue just not reproducing.
>
> However, despite many many reboots the last good commit in my branch
> - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: Fixup busy
> detection while...) doesn't ever show the issue. While the immediately
> following commit which bisect found -
> e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq
> before DATA irq) always does.
>
> The immensely frustrating part is while backing that single change off
> from its commit sha always makes the issue go away, reverting that
> change from on top of v3.15 doesn't. The issue persists. Since it
> doesn't revert cleanly, I also reverted a following patch that it
> interacted with 8d94b54d99ea968a9d188ca0e68793ebed601220 (mmc: mmci:
> Enable support for busy detection....) to make sure I didn't miss some
> dependency and the issue *still* crops up. In fact, doing a git diff
> bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3..v3.15 drivers/mmc/ doesn't
> seem to resolve the issue.
>
> So I'm really at a bit of a loss on what to do next. While it seems
> that the "mmci: Handle CMD irq before DATA..." commit is problematic,
> there also seems to be some other commit in v3.15 which results in the
> same problematic behavior. I may try to bisect again between the
> first bad commit and v3.15, reverting the bad commit each time to see
> if I can chase it down, but if anyone has better debugging tools here,
> I'd greatly appreciate it.
>
> Again, I'm happy to help interested folks get this reproducing on
> their own machine for debugging.
>

Hi John,

I have quickly implemented my proposal 1). I am testing them on real
HW now, will post the patches as soon as I can and keep you on cc.

I would also really appreciate if you could help out giving them a
quick try for your QEMU environment.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/