Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

From: Mathias Nyman
Date: Mon Mar 20 2017 - 10:21:47 EST


On 19.03.2017 23:29, Diego Viola wrote:
On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola <diego.viola@xxxxxxxxx> wrote:
On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <diego.viola@xxxxxxxxx> wrote:
On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
On Fri, 17 Mar 2017, Diego Viola wrote:

Hi,

Here's the log to the netconsole dmesg capture, I've used
xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.

I did the usual suspend/resume cycle with i3lock, it hung after the
third attempt when trying to resume from suspend.

https://bugzilla.kernel.org/attachment.cgi?id=255309

I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
maintainer.

Alan Stern


Please let me know if I should provide something else.

Thanks,
Diego



I've forwarded my email to Mathias Nyman.

Diego

Still a problem with 4.11.0-rc2-ARCH+


From a quick glance it looks like rtsx_usb_ms probaly takes a mutex (&ucr->dev_mutex)
and then issues a usb_bulk_msg() and waits for it to complete with mutex held.
The usb message times out, usb core kills the urb but the URB probably never gets completed,
and function never returns.

Everyting using ucr->dev_mutex would block, for example the kthread, rtsx_usb_detect_ms_card
that continuously tries to detect a ms card, takes and releases the same ucr->dev_mutex for
each try.

[ 614.026502] INFO: task kworker/u8:0:5 blocked for more than 120 seconds.
[ 614.027865] Not tainted 4.10.3-1-ARCH #1
[ 614.029116] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 614.030467] kworker/u8:0 D 0 5 2 0x00000000
[ 614.031812] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 614.033179] Call Trace:
[ 614.034550] __schedule+0x22f/0x700
[ 614.035940] schedule+0x3d/0x90
[ 614.037334] schedule_preempt_disabled+0x15/0x20
[ 614.038680] __mutex_lock_slowpath+0x19b/0x2d0
[ 614.040067] ? flush_workqueue+0x204/0x580
[ 614.041456] mutex_lock+0x23/0x30
[ 614.042163] acpi_device_hotplug+0x43/0x3e7
[ 614.042882] acpi_hotplug_work_fn+0x1e/0x29
[ 614.043612] process_one_work+0x1e5/0x470
[ 614.044356] worker_thread+0x48/0x4e0
[ 614.045077] kthread+0x101/0x140
[ 614.045788] ? process_one_work+0x470/0x470
[ 614.046495] ? kthread_create_on_node+0x60/0x60
[ 614.047215] ret_from_fork+0x2c/0x40
[ 614.047950] INFO: task rtsx_usb_ms_1:235 blocked for more than 120 seconds.
[ 614.048697] Not tainted 4.10.3-1-ARCH #1
[ 614.049465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 614.050265] rtsx_usb_ms_1 D 0 235 2 0x00000000
[ 614.051064] Call Trace:
[ 614.051841] __schedule+0x22f/0x700
[ 614.052626] schedule+0x3d/0x90
[ 614.053411] usb_kill_urb.part.4+0x6c/0xa0 [usbcore]
[ 614.054198] ? wake_atomic_t_function+0x60/0x60
[ 614.055005] usb_kill_urb+0x21/0x30 [usbcore]
[ 614.055819] usb_start_wait_urb+0xe5/0x170 [usbcore]
[ 614.056652] usb_bulk_msg+0xbd/0x160 [usbcore]
[ 614.057489] rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
[ 614.058306] rtsx_usb_read_register+0x6c/0xc0 [rtsx_usb]
[ 614.059118] rtsx_usb_detect_ms_card+0x98/0x120 [rtsx_usb_ms]

There is a lot going on in xhci during the last suspend befor this.
URBs are canceled, devices reset and re-enumerated, timeout while reading descriptor,
device firmware changed.

It's possible we end up in a situation where xhci never givers back the URB.

4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue and giveback.

Could you try enabling xhci tracing before suspending (not the same as xhci verbose dynamic debug)
It will generate a lot of data, so better to remove all extra USB devices.

xhci tracing can be added with:

mount -t debugfs none /sys/kernel/debug
echo xhci-hcd >> /sys/kernel/debug/tracing/set_event

and then send the output of cat /sys/kernel/debug/tracing/trace

-Mathias