[REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2

From: Oleksandr Natalenko
Date: Thu Nov 23 2023 - 13:29:03 EST


Hello.

Since v6.6.2 kernel release I'm experiencing a regression with regard to USB ports behaviour after a suspend/resume cycle.

If a USB port is empty before suspending, after resuming the machine the port doesn't work. After a device insertion there's no reaction in the kernel log whatsoever, although I do see that the device gets powered up physically. If the machine is suspended with a device inserted into the USB port, the port works fine after resume.

This is an AMD-based machine with hci version 0x110 reported. As per the changelog between v6.6.1 and v6.6.2, 603 commits were backported into v6.6.2, and one of the commits was as follows:

$ git log --oneline v6.6.1..v6.6.2 -- drivers/usb/host/xhci-pci.c
14a51fa544225 xhci: Loosen RPM as default policy to cover for AMD xHC 1.1

It seems that this commit explicitly enables runtime PM specifically for my platform. As per dmesg:

v6.6.1: quirks 0x0000000000000410
v6.6.2: quirks 0x0000000200000410

Here, bit 33 gets set, which, as expected, corresponds to:

drivers/usb/host/xhci.h
1895:#define XHCI_DEFAULT_PM_RUNTIME_ALLOW BIT_ULL(33)

This commit is backported from the upstream commit 4baf12181509, which is one of 16 commits of the following series named "xhci features":

https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@xxxxxxxxxxxxxxx/

It appears that there was another commit in this series, also from Basavaraj (in Cc), a5d6264b638e, which was not picked for v6.6.2, but which stated the following:

Use the low-power states of the underlying platform to enable runtime PM.
If the platform doesn't support runtime D3, then enabling default RPM will
result in the controller malfunctioning, as in the case of hotplug devices
not being detected because of a failed interrupt generation.

It felt like this was exactly my case. So, I've conducted two tests:

1. Reverted 14a51fa544225 from v6.6.2. With this revert the USB ports started to work fine, just as they did in v6.6.1.
2. Left 14a51fa544225 in place, but also applied upstream a5d6264b638e on top of v6.6.2. With this patch added the USB ports also work after a suspend/resume cycle.

This runtime PM enablement did also impact my AX200 Bluetooth device, resulting in long delays before headphones/speaker can connect, but I've solved this with btusb.enable_autosuspend=N. I think this has nothing to do with the original issue, and I'm OK with this workaround unless someone has got a different idea.

With that, please consider either reverting 14a51fa544225 from the stable kernel, or applying a5d6264b638e in addition to it. Given the mainline kernel has got both of them, I'm in favour of applying additional commit to the stable kernel.

I'm also Cc'ing all the people from our Mastodon discussion where I initially complained about the issue as well as about stable kernel branch stability:

https://activitypub.natalenko.name/@oleksandr/statuses/01HFRXBYWMXF9G4KYPE3XHH0S8

I'm not going to expand more on that in this email, especially given Greg indicated he read the conversation, but I'm open to continuing this discussion as I still think that current workflow brings visible issues to ordinary users, and hence some adjustments should be made.

Thank you.

--
Oleksandr Natalenko (post-factum)

Attachment: signature.asc
Description: This is a digitally signed message part.