Re: 2.6.22-rc2-mm1

From: Alan Stern
Date: Wed May 23 2007 - 13:00:53 EST


On Wed, 23 May 2007, Andrew Morton wrote:

> > > > This is intermittently getting resume-from-RAM failures. It is not
> > > > sufficiently repeatable to be able to bisect.
> > > >
> > > > [ 1381.119362] PM: Preparing system for mem sleep
> > > > [ 2331.798452] Stopping tasks ...
> > > > [ 2351.760431] Stopping kernel threads timed out after 20 seconds (2 tasks refusing to freeze):
> > > > [ 2351.762385] ksuspend_usbd
> > > > [ 2351.764374] khubd
> > > > [ 2351.766338] Restarting tasks ... done.
> > >
> > > Hmm, that seems to be related to usb-fix-suspend-to-ram.patch (probably one of
> > > the threads is waiting for a completion by some other thread that has been
> > > frozen already).
> >
> > Is it possible to get an Alt-SysRq-T stack trace during those 20
> > seconds? Knowing what those threads are waiting for would be a big
> > help.

> The trace is at http://userweb.kernel.org/~akpm/tasks.txt. Interesting
> bits are
>
> [ 144.201264] khubd D 00400005 0 160 2 (L-TLB)
> [ 144.204358] c207fe78 00000046 90399a85 00400005 00000246 c207fe60 c25b0cc4 c206f4cc
> [ 144.204539] 00000286 00000000 769e4cea 0040000a 90399a85 00400005 c32713c0 c207fed4
> [ 144.207754] 00000001 c207fe94 c207febc c02e8e1b 00000000 00000000 00000000 00000000
> [ 144.210934] Call Trace:
> [ 144.217012] [<c02e8e1b>] wait_for_completion+0x68/0x91
> [ 144.220090] [<c011824f>] default_wake_function+0x0/0x9
> [ 144.223158] [<c0127a41>] flush_cpu_workqueue+0x4d/0x55
> [ 144.226223] [<c0127a69>] wq_barrier_func+0x0/0x8
> [ 144.229269] [<c026343d>] usb_release_dev+0x28/0x63
> [ 144.232340] [<c0233011>] device_release+0x37/0x7c
> [ 144.235431] [<c01cb6c7>] kobject_cleanup+0x3d/0x54
> [ 144.238520] [<c01cb6de>] kobject_release+0x0/0x8
> [ 144.241631] [<c01cc2a7>] kref_put+0x75/0x82
> [ 144.244699] [<c0265482>] hub_thread+0x376/0xa74
> [ 144.247768] [<c01180c2>] pick_next_task_fair+0xf2/0x12a
> [ 144.250815] [<c0116af1>] __wake_up_common+0x31/0x4f
> [ 144.253864] [<c012a259>] autoremove_wake_function+0x0/0x35
> [ 144.256902] [<c026510c>] hub_thread+0x0/0xa74
> [ 144.259944] [<c012a102>] kthread+0x36/0x5c
> [ 144.262891] [<c012a0cc>] kthread+0x0/0x5c
> [ 144.265757] [<c010464b>] kernel_thread_helper+0x7/0x10
> [ 144.268716] =======================
>
>
> [ 144.137704] ksuspend_usbd D 00400005 0 157 2 (L-TLB)
> [ 144.140830] c2085f18 00000046 9072767a 00400005 c20626f0 c010449b c3182118 c206288c
> [ 144.141011] c3182120 c3182120 76d728df 0040000a 9072767a 00400005 c3271200 c3182118
> [ 144.144263] c3182120 00000246 c20626f0 c02ea1c9 00000000 00000000 00000000 00000000
> [ 144.147576] Call Trace:
> [ 144.153929] [<c010449b>] common_interrupt+0x23/0x28
> [ 144.157245] [<c02ea1c9>] __down+0xba/0xc6
> [ 144.160528] [<c011824f>] default_wake_function+0x0/0x9
> [ 144.163832] [<c02664fc>] hcd_resume_work+0x0/0x43
> [ 144.167126] [<c02e9fd3>] __down_failed+0x7/0xc
> [ 144.170372] [<c0266518>] hcd_resume_work+0x1c/0x43
> [ 144.173603] [<c01278cf>] run_workqueue+0x6d/0xdf
> [ 144.176780] [<c0127b4c>] worker_thread+0x0/0xd0
> [ 144.179885] [<c0127b4c>] worker_thread+0x0/0xd0
> [ 144.182930] [<c0127c12>] worker_thread+0xc6/0xd0
> [ 144.185964] [<c012a259>] autoremove_wake_function+0x0/0x35
> [ 144.189056] [<c012a102>] kthread+0x36/0x5c
> [ 144.192118] [<c012a0cc>] kthread+0x0/0x5c
> [ 144.195153] [<c010464b>] kernel_thread_helper+0x7/0x10


Okay, it's clear that the two threads are in deadlock. It's not clear
how the deadlock arose to begin with -- apparently there was a remote
wakeup request for a root hub at the same time as a device below that
root hub was disconnected, which doesn't make much sense.

Anyway, this looks like a good place to use cancel_work_sync(). The
patch below is highly untested, so Andrew, you're the guinea pig. :-)
If it seems to help, I'll submit it with a proper Changelog entry.

Alan Stern


Index: usb-2.6/drivers/usb/core/hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/hub.c
+++ usb-2.6/drivers/usb/core/hub.c
@@ -1294,6 +1294,7 @@ void usb_disconnect(struct usb_device **
*pdev = NULL;
spin_unlock_irq(&device_state_lock);

+#ifdef CONFIG_USB_SUSPEND
/* Synchronize with the ksuspend thread to prevent any more
* autosuspend requests from being submitted, and decrement
* the parent's count of unsuspended children.
@@ -1303,6 +1304,10 @@ void usb_disconnect(struct usb_device **
usb_autosuspend_device(udev->parent);
usb_pm_unlock(udev);

+ cancel_delayed_work(&udev->autosuspend);
+ cancel_work_sync(&udev->autosuspend.work);
+#endif
+
put_device(&udev->dev);
}

Index: usb-2.6/drivers/usb/core/usb.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/usb.c
+++ usb-2.6/drivers/usb/core/usb.c
@@ -184,10 +184,6 @@ static void usb_release_dev(struct devic

udev = to_usb_device(dev);

-#ifdef CONFIG_USB_SUSPEND
- cancel_delayed_work(&udev->autosuspend);
- flush_workqueue(ksuspend_usb_wq);
-#endif
usb_destroy_configuration(udev);
usb_put_hcd(bus_to_hcd(udev->bus));
kfree(udev->product);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/