Re: [PATCH 4.9 00/62] 4.9.335-rc1 review

From: Adrian Hunter
Date: Tue Dec 06 2022 - 04:25:04 EST


On 6/12/22 02:11, Florian Fainelli wrote:
> On 12/5/22 14:48, Florian Fainelli wrote:
>> On 12/5/22 14:28, Jon Hunter wrote:
>>> Hi Greg,
>>>
>>> On 05/12/2022 19:08, Greg Kroah-Hartman wrote:
>>>> This is the start of the stable review cycle for the 4.9.335 release.
>>>> There are 62 patches in this series, all will be posted as a response
>>>> to this one.  If anyone has any issues with these being applied, please
>>>> let me know.
>>>>
>>>> Responses should be made by Wed, 07 Dec 2022 19:07:46 +0000.
>>>> Anything received after that time might be too late.
>>>>
>>>> The whole patch series can be found in one patch at:
>>>>     https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.335-rc1.gz
>>>> or in the git tree and branch at:
>>>>     git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
>>>> and the diffstat can be found below.
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>>
>>>> -------------
>>>> Pseudo-Shortlog of commits:
>>>>
>>>> Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>>>>      Linux 4.9.335-rc1
>>>>
>>>> Adrian Hunter <adrian.hunter@xxxxxxxxx>
>>>>      mmc: sdhci: Fix voltage switch delay
>>>
>>>
>>> I am seeing a boot regression on a couple boards and bisect is pointing to the above commit.
>>
>> Same thing here, getting a hard lock for our devices with the SDHCI controller enabled, sometimes we are lucky to see the following:
>>
>> [    4.790367] mmc0: SDHCI controller on 84b0000.sdhci [84b0000.sdhci] using ADMA 64-bit
>> [   25.802351] INFO: rcu_sched detected stalls on CPUs/tasks:
>> [   25.807871]  1-...: (1 GPs behind) idle=561/140000000000000/0 softirq=728/728 fqs=5252
>> [   25.815892]  (detected by 0, t=21017 jiffies, g=61, c=60, q=55)
>> [   25.821834] Task dump for CPU 1:
>> [   25.825069] kworker/1:1     R  running task        0   509      2 0x00000002
>> [   25.832164] Workqueue: events_freezable mmc_rescan
>> [   25.836974] Backtrace:
>> [   25.839440] [<ce32fea4>] (0xce32fea4) from [<ce32fed4>] (0xce32fed4)
>> [   25.845803] Backtrace aborted due to bad frame pointer <cd2f0a54>
>>
>> Also confirmed that reverting that change ("mmc: sdhci: Fix voltage switch delay") allows devices to boot properly.
>>
>> Had not a chance to test the change when submitted for mainline despite being copied, sorry about that.
>>
>> Since that specific commit is also included in the other stable trees (5.4, 5.10, 5.15 and 6.0) I will let you know whether the same issue is present in those trees shortly thereafter.
>
> This only appears to impact 4.9, Adrian is there a missing functional dependency for "mmc: sdhci: Fix voltage switch delay" to work correctly on the 4.9 kernel?

The thing that leaps to mind is that "mmc: sdhci: Fix voltage switch delay" returns out of sdhci_set_ios() without releasing the spinlock which was removed in later kernels. I expect below would help, but a revert might allow a more considered response - it is a holiday here today.


diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index cfd665f0d6db..f3e2aba53ffa 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1684,7 +1684,7 @@ static void sdhci_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
host->timing == ios->timing &&
host->version >= SDHCI_SPEC_300 &&
!sdhci_presetable_values_change(host, ios))
- return;
+ goto out;

ctrl = sdhci_readb(host, SDHCI_HOST_CONTROL);

@@ -1773,7 +1773,7 @@ static void sdhci_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
host->ops->set_clock(host, host->clock);
} else
sdhci_writeb(host, ctrl, SDHCI_HOST_CONTROL);
-
+out:
/*
* Some (ENE) controllers go apeshit on some ios operation,
* signalling timeout and CRC errors even on CMD0. Resetting