Re: [PATCH v2 2/2] mmc: core: fall back host->f_init if failing to init mmc card after resume

From: Shawn Lin
Date: Wed Aug 03 2016 - 03:58:32 EST


Hi Jaehoon,

在 2016/8/3 14:54, Jaehoon Chung 写道:
Hi Shawn,

On 08/03/2016 10:35 AM, Shawn Lin wrote:
Hi Jaehoon,

在 2016/8/2 18:47, Jaehoon Chung 写道:
Hi Shawn,

On 08/02/2016 06:07 PM, Shawn Lin wrote:
Hi Ulf,

在 2016/7/20 9:57, Shawn Lin 写道:
We observed the failure of initializing card after resume
accidentally. It's hard to reproduce but we did get report from
the suspend/resume test of our RK3399 mp test farm . Unfortunately,
we still fail to figure out what was going wrong at that time.
Also we can't achieve it by retrying the host->f_init without falling
back it. But this patch will solve the problem as we could add some log
there and see that we resume the mmc card successfully after falling
back the host->f_init. There is no obvious side effect found, so it seems
this patch will improve the stability.

[ 93.405085] mmc1: unexpected status 0x800900 after switch
[ 93.408474] mmc1: switch to bus width 1 failed
[ 93.408482] mmc1: mmc_select_hs200 failed, error -110
[ 93.408492] mmc1: error -110 during resume (card was removed?)
[ 93.408705] PM: resume of devices complete after 213.453 msecs


Status 0x800900 is COM_CRC_ERROR..it seems that CRC check fails.
But i don't know what is related with "fall back host->f_init".

Yup, actually it also looks strange to me that we should downgrade
the host->f_init when resuming. CRC error shouldn't occour as 400K
could work at booting time, also we could see the HS400 work normally
later which make me believe that it shouldn't belong to signal problem,
but we need to figure out why the controller think it should be a CRC
error.

The best way is to make it easy to be reproduced that we could check the
pcb signal there, and I still try it then. Or there is a HW/Chip
condition that make my emmc PHY work improperly accidentally. Anyway
more proof should be provided before I'am able to land patch to
fix/avoid the root cause. I'm doing it..


I don't have a knowledge of rockchip...
but in my experience, there are some cases, not mmc core problem..

1. Exynos is using the gpio as clk/cmd/data line..and gpio has the driver strength value.
If driver strength is changed after resuming, it's possible to occur the error.

Yes, the related settings or configuration for PHY didn't change.


2. And glitch for I/O line..this loop has the delay..Just delay?

We have retryied 400K if failing to resume and will not break out if
still finding failure, but it doesn't help.



So you can check the other problem... :)

At Booting time, f_init can use 400K..but after resuming..f_init need to use 100K..hmm..strange..


Agreed..

So let's come back to the topic -- Should we support downgrading f_init
after failing to resume just as what we do at the booting time? It's
possible that the enviroment changes like(noise, temperature, static)
will lead to the failure after resuming. Shouldn't the mechanism be more
robust to deal with these unexpected cases? :)

I think you also felt that this patch is workaround.

More or less :)

Because it's not clearly why occurred the CRC error and resuming fails.

yup, that stop me from landing "real" patch to fix it. But think twice,
there are many unexpected cases can make the resumre failed. We may see
different f_init when doing reboot test for a certain
board at booting time(actually I did see it many many times). That is
why I'm sure it's the same for resume case.

(CRC error should be disappeared when clock is used to the lower value..)

And I'm not sure that 100K is enough..If 100k also didn't work fine at resuming time, how we do?
This issue is interesting..because it's possible to occur in real case...I have also seen the similar issue.
But my solution also was workaround. :(

Nice to hear you have suffered from the similar issue, not just me.
Could you share your WR solution? :)


If it's more clearly than now, i can agree your patch..but now..i don't know what is correct.
I will listen Ulf and other guys's opinions..also yours.

Best Regards,
Jaehoon Chung



Best Regards,
Jaehoon Chung


Any comments for this patch? :)

Signed-off-by: Shawn Lin <shawn.lin@xxxxxxxxxxxxxx>

---

Changes in v2:
- remove mmc_power_off
- take f_min into consideration

drivers/mmc/core/mmc.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 403b97b..a2891c1 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1945,6 +1945,7 @@ static int mmc_suspend(struct mmc_host *host)
static int _mmc_resume(struct mmc_host *host)
{
int err = 0;
+ int i;

BUG_ON(!host);
BUG_ON(!host->card);
@@ -1954,8 +1955,22 @@ static int _mmc_resume(struct mmc_host *host)
if (!mmc_card_suspended(host->card))
goto out;

- mmc_power_up(host, host->card->ocr);
- err = mmc_init_card(host, host->card->ocr, host->card);
+ /*
+ * Let's try to fallback the host->f_init
+ * if failing to init mmc card after resume.
+ */
+ for (i = 0; i < ARRAY_SIZE(freqs); i++) {
+ if (host->f_init < max(freqs[i], host->f_min))
+ continue;
+ else
+ host->f_init = max(freqs[i], host->f_min);
+
+ mmc_power_up(host, host->card->ocr);
+ err = mmc_init_card(host, host->card->ocr, host->card);
+ if (!err)
+ break;
+ }
+
mmc_card_clr_suspended(host->card);

out:















--
Best Regards
Shawn Lin