Re: Issue with Broadcom wireless in 5.2rc1 (was Re: [PATCH] mmc: sdhci: queue work after sdhci_defer_done())

From: Arend Van Spriel
Date: Sun May 26 2019 - 14:51:29 EST


On 5/26/2019 2:21 PM, Brian Masney wrote:
+ Broadcom wireless maintainers

On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote:
On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote:
On 24/05/19 2:10 PM, Brian Masney wrote:
WiFi stopped working on the LG Nexus 5 phone and the issue was bisected
to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that
moved from using a tasklet to a work queue. That patch also changed
sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when
sdhci_defer_done() is true. Change it to queue work to the complete work
queue if sdhci_defer_done() is true so that the functionality is
equilivent to what was there when the finish_tasklet was present. This
corrects the WiFi breakage on the Nexus 5 phone.

Signed-off-by: Brian Masney <masneyb@xxxxxxxxxxxxx>
Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet")
---
[ ... ]

drivers/mmc/host/sdhci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 97158344b862..3563c3bc57c9 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
continue;
if (sdhci_defer_done(host, mrq)) {
- result = IRQ_WAKE_THREAD;
+ queue_work(host->complete_wq, &host->complete_work);

The IRQ thread has a lot less latency than the work queue, which is why it
is done that way.

I am not sure why you say this change is equivalent to what was there
before, nor why it fixes your problem.

Can you explain some more?

[ ... ]

drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
sdio_claim_host() and it appears to never return.

When the brcmfmac driver is loaded, the firmware is requested from disk,
and that's when the deadlock occurs in 5.2rc1. Specifically:

1) brcmf_sdio_download_firmware() in
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls
sdio_claim_host()

2) brcmf_sdio_firmware_callback() is called and brcmf_sdiod_ramrw()
tries to claim the host, but has to wait since its already claimed
in #1 and the deadlock occurs.

This does not make any sense to me. brcmf_sdio_download_firmware() is called from brcmf_sdio_firmware_callback() so they are in the same context. So #2 is not waiting for #1, but something else I would say. Also #2 calls sdio_claim_host() after brcmf_sdio_download_firmware has completed so definitely not waiting for #1.

I tried to release the host before the firmware is requested, however
parts of brcmf_chip_set_active() needs the host to be claimed, and a
similar deadlock occurs in brcmf_sdiod_ramrw() if I claim the host
before calling brcmf_chip_set_active().

I started to look at moving the sdio_{claim,release}_host() calls out of
brcmf_sdiod_ramrw() but there's a fair number of callers, so I'd like to
get feedback about the best course of action here.

Long ago Franky reworked the sdio critical sections requiring sdio claim/release and I am pretty sure they are correct.

Could you try with lockdep kernel and see if that brings any more information. In the mean time I will update my dev branch to 5.2-rc1 and see if I can find any clues.

Regards,
Arend