Re: [PATCH v2] mmc: inline the first mmc_scan() on mmc_start_host()

From: Dennis Zhou
Date: Fri Jun 30 2023 - 18:09:28 EST


Hi Ulf,

On Fri, Jun 30, 2023 at 01:26:14PM +0200, Ulf Hansson wrote:
> On Tue, 27 Jun 2023 at 19:20, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> >
> > Hi Dennis,
> >
> > On Thu, Mar 30, 2023 at 1:48 AM Dennis Zhou <dennis@xxxxxxxxxx> wrote:
> > > When using dm-verity with a data partition on an emmc device, dm-verity
> > > races with the discovery of attached emmc devices. This is because mmc's
> > > probing code sets up the host data structure then a work item is
> > > scheduled to do discovery afterwards. To prevent this race on init,
> > > let's inline the first call to detection, __mm_scan(), and let
> > > subsequent detect calls be handled via the workqueue.
> > >
> > > Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx>
> >
> > Thanks for your patch, which is now commit 2cc83bf7d41113d9 ("mmc:
> > core: Allow mmc_start_host() synchronously detect a card") in
> > linux-next/master mmc/next next-20230614 next-20230615 next-20230616
> >
> > I have bisected the following failure on Renesas Salvator-XS with R-Car H3
> > ES2.0 to the above commit:
> >
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD0)
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD1)
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD0)
> > renesas_sdhi_internal_dmac ee140000.mmc: timeout waiting for
> > hardware interrupt (CMD1)
> > mmc0: Failed to initialize a non-removable card
>
> Thanks for reporting!
>
> After I had a closer look, I realize that all the renesas/tmio drivers
> are suffering from the similar problem. A host driver must not call
> mmc_add_host() before it's ready to serve requests.
>
> Things like initializing an irq-handler must be done before
> mmc_add_host() is called, which is not the case for renesas/tmio. In
> fact, there seems to be a few other host drivers that have the similar
> pattern in their probe routines.
>
> Note that, even if the offending commit below triggers this problem
> 100% of the cases (as the probe path has now becomes synchronous),
> there was a potential risk even before. Previously, mmc_add_host()
> ended up punting a work - and if that work ended up sending a request
> to the host driver, *before* the irq-handler would be ready, we would
> hit the similar problem. I bet adding an msleep(1000) immediately
> after mmc_add_host() in tmio_mmc_host_probe(), would then trigger this
> problem too. :-)
>

I'm deeply appreciative that you're willing to get to the bottom of the
issue.

> That said, I am going to revert the offending commit to fix these
> problems, for now. Then I will try to help out and fixup the relevant
> host drivers - and when that is done, we can give this whole thing a
> new try.
>
> Any objections or other suggestions to this?
>

Acked-by: Dennis Zhou <dennis@xxxxxxxxxx>

Thanks,
Dennis

> Kind regards
> Uffe
>