[BUG] mmc: core: adjust polling interval for CMD1

From: H. Nikolaus Schaller
Date: Thu Feb 17 2022 - 15:12:07 EST


Hi,
Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
with our disto kernel based on v5.17-rc1:

>> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
>> Black, but didn't check the logs.
>> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
>> traffic there were messages like
>>
>> [ 662.529584] mmc1: error -110 doing runtime resume
>> [ 669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
>>
>> [ 739.076072] mmc1: Timeout waiting for hardware interrupt.
>> [ 739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [ 739.231053] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00003101
>> [ 739.316472] mmc1: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000400
>> [ 739.401937] mmc1: sdhci: Argument: 0x00342d30 | Trn mode: 0x00000023
>> [ 739.487439] mmc1: sdhci: Present: 0x01f70000 | Host ctl: 0x00000000
>> [ 739.573007] mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
>> [ 739.658609] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00003c07
>> [ 739.744224] mmc1: sdhci: Timeout: 0x00000007 | Int stat: 0x00000002
>> [ 739.829896] mmc1: sdhci: Int enab: 0x027f000b | Sig enab: 0x027f000b
>> [ 739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>> [ 740.001394] mmc1: sdhci: Caps: 0x07e10080 | Caps_1: 0x00000000
>> [ 740.087208] mmc1: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
>> [ 740.173051] mmc1: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000
>> [ 740.258928] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
>> [ 740.344854] mmc1: sdhci: Host ctl2: 0x00000000
>> [ 740.402796] mmc1: sdhci: ============================================
>>
>> and finally IO errors and a corrupted filesystem.
>>
>> 5.17.0-rc4-letux+ shows the same behaviour.

I checked with my BeagleBoard Black (am3358) and can confirm this observation.
It happens only with the integrated eMMC but not with the µSD connected to
the other mmc interface.

A git bisect found:

76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
Author: Huijin Park <huijin.park@xxxxxxxxxxx>
Date: Thu Nov 4 15:32:31 2021 +0900

mmc: core: adjust polling interval for CMD1

In mmc_send_op_cond(), loops are continuously performed at the same
interval of 10 ms. However the behaviour is not good for some eMMC
which can be out from a busy state earlier than 10 ms if normal.

Rather than fixing about the interval time in mmc_send_op_cond(),
let's instead convert into using the common __mmc_poll_for_busy().

The reason for adjusting the interval time is that it is important
to reduce the eMMC initialization time, especially in devices that
use eMMC as rootfs.

Test log(eMMC:KLM8G1GETF-B041):

before: 12 ms (0.311016 - 0.298729)
[ 0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
[ 0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
[ 0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
[ 0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007

after: 2 ms (0.301270 - 0.298762)
[ 0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
[ 0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
[ 0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
[ 0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
[ 0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007

Signed-off-by: Huijin Park <huijin.park@xxxxxxxxxxx>
Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@xxxxxxxxxxx
Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>

Reverting this makes v5.17-rc[1-4] work.

Any suggestions or fixes?

BR and thanks,
Nikolaus Schaller

Reported-by: jdawin@xxxxxxxxxxxxxxxxxxxxx