Re: [PATCH 4.13 03/28] Bluetooth: btusb: fix QCA Rome suspend/resume

From: Brian Norris
Date: Tue Dec 19 2017 - 18:11:43 EST


Hi Kai,

On Tue, Dec 19, 2017 at 12:28:17PM +0800, Kai Heng Feng wrote:
> > On 19 Dec 2017, at 2:13 AM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
> > On Mon, Dec 18, 2017 at 12:43:48PM +0100, Greg Kroah-Hartman wrote:
> >> On Fri, Dec 15, 2017 at 07:05:39PM -0800, Matthias Kaehlcke wrote:
> >>> We identified the above patch as the culprit, in combination with USB
> >>> autosuspend being enabled for the Bluetooth controller.
> >>>
> >>> We found that the following (recent) upstream patch fixes the issue on
> >>> most devices (we are still investigating one case on a specific device):
> >
> > A key word to highlight here is "most". I have at least one device that
> > is consistently having problems even with both patches included; the
> > only way things work reliably so far is to simply revert the $subject
> > patch in 4.4.x.
>
> The problem we have is that the QCA Rome Bluetooth stops working
> after system S3. The reset resume quirk workaround the issue on both
> mainline and 4.4.x (at least for me).

Understood. But that's not the case for all systems, and on some, you're
regressing functionality.

Many systems keep power enabled for system suspend, and so that "fix" is
not necessary for them. It's also not very precise, because it seems to
act in many cases where it need not -- for one, it takes effect on *all*
resume attempts, even resuming from autosuspend. I really doubt your
system is losing USB power in S0 + USB autosuspend?

So, if you really need this patch for some systems, it seems like it
should be much better targeted. You're currently adding a lot of
unnecessary overhead and fragility.

> > I'll try to investigate further, since this result is a bit confusing
> > still. If there's a real problem with the patch (and I suspect there
> > might be), it probably shouldn't be in mainline eitherâ
>
> Do you have the same issue on mainline?

Refer to my note [1] :)

But because you asked, I did the work necessary to boot mainline on this
system, and in fact, I see the same problem. I think that's enough
reason to revert your patch in all branches (upstream, and any -stable
branch that already took it).

To be clear, here's the kernel logs on 4.15-rc4+, where the 55-second
mark is around where I try to power on and scan for BT devices (power-on
and scan both fail):

[ 2.323475] usb 3-1: new full-speed USB device number 2 using ohci-platform
...
[ 2.731719] usb 3-1: New USB device found, idVendor=0cf3, idProduct=e300
...
[ 2.867870] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
...(udev doesn't run until here)...
[ 35.733139] usbcore: registered new interface driver btusb
[ 35.740912] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin
[ 35.749216] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111
...
[ 35.816104] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
...
[ 55.073503] usb 3-1: reset full-speed USB device number 2 using ohci-platform
...
[ 57.772513] Bluetooth: hci0: urb 00000000ec39086b failed to resubmit (113)
[ 57.780217] Bluetooth: hci0: urb 0000000066228fda failed to resubmit (113)

Whereas reverting the $subject patch yields no reset and URB failure;
Bluetooth seems to work fine.

> >>> commit a0085f2510e8976614ad8f766b209448b385492f
> >>> Author: Sukumar Ghorai <sukumar.ghorai@xxxxxxxxx>
> >>> Date: Wed Aug 16 14:46:55 2017 -0700
> >>>
> >>> Bluetooth: btusb: driver to enable the usb-wakeup feature
> > [...]
> >>> stable branches are currently broken for BTUSB_QCA_ROME with USB
> >>> autosuspend enabled, since the above patch is not included (I only
> >>> checked v4.4 and v4.9), so we probably want to integrate it.
> >>
> >> Now queued up, thanks for letting me know.
> >
> > I think you have almost as much information as I do at this point, and
> > I'll try to reply back here if I figure out anything more, so I'll leave
> > you to your decisions. But I would personally revert, not backport more
> > patches.
> >
> > Among the reasons:
> > (a) I have at least one device that does not work better with both
> > patches [1]
> > (b) So far, I haven't seen any explanation why commit a0085f2510e8
> > should be a dependency for $subject patch; the closest I can imagine
> > would be that commit a0085f2510e8 could effectively neutralize
> > $subject patch -- if the device is marked as a wakeup source, then
> > it will never try to RESET on resume -- but that's still a fuzzy
> > signal; just because it's marked a wakeup source sometimes doesn't
> > mean it always will be. We can disable it in user space too.
>
> Hi Leif,
>
> Can you try if a0085f2510e8 breaks your $subject commit?
> If itâs a yes, then what Brian suggests is correct.
>
> Also, can you share the output of "usb-deviceâ for your btusb device?

FWIW, here's mine:

T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0cf3 ProdID=e300 Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

This is an integrated Wifi+BT QCA6174A-5 M.2 module.

> > (c) Isn't it safer to revert than to backport more? You're delving into
> > feature-land, not bugfix-landâ
>
> Sounds reasonable.

Shall I post a proper revert for mainline, with Fixes tags and CC
stable? Or did one of you want to do that?

> Weâll need more information from Leif if we need to do ID-specific quirks.

As I see it, this probably isn't just an ID-specific thing; this patch
looks bad for its intended purpose, per my comments above (it's
overkill, for one). I think there are many factors involved here, and it
needs more study than "I just booted, suspended/resumed once, and it
worked better."

Among the things to consider:
* How do we determine the likelihood that the device gets powered off in
S3? The device_may_wakeup() check can be influenced (among other
things) by user space policy, so that doesn't really look right to me.
(ID-based checks might help a little, since you can probably
differentiate chips that are likely used on discrete/removable USB
parts, rather than internal chips?)
* The ordering: right now, you're enabling "reset on resume" at probe
time, but QCA ROME devices only load their firmware in btusb_open().
This gives a window of time in which you're adding weird reset
behavior while the device may be running its ROM firmware, not the
patched firmware loaded from /lib/firmware/. That might be one reason
things don't work out so well?
I also suspect that might be one reason that Matthias sometimes saw
the problem disappear on his devices, where I didn't. I have my device
configured to cold reset the BT device on every reboot, whereas his
isn't; it might retain the FW in the BT device's RAM across reboot.
So my behavior was more deterministic :)
* The behavior here is affected by USB autosuspend, not just S3. I
mentioned this above, but more specifically, ChromeOS has udev rules
like this, to enable autosuspend on various sorts of devices
(including bluetooth controllers + BT HID devices):

<udev rule snippet>
ATTR{idVendor}=="0cf3", ATTR{idProduct}=="e300", GOTO="autosuspend_enable"
...
# Enable autosuspend
LABEL="autosuspend_enable"
TEST=="power/control", ATTR{power/control}="auto", GOTO="autosuspend_end"

LABEL="autosuspend_end"
</udev rule snippet>

I suspect some desktop systems might not have the same policies, so
you're not testing autosuspend in the same way I am?

Brian

> Kai-Heng
>
> > Brian
> >
> > [1] Before you ask: this particular device is not quite fully supported
> > on upstream yet, though its sibling devices are. With a bit of effort, I
> > might be able to test a clean upstream. At the moment, I'm using our
> > Chrom{e,ium}OS 4.4 kernel, where we regularly merge in -stable updates.
>
>