Re: [REGRESSION] 2.6.24/25: random lockups when accessing externalUSB harddrive

From: Stefan Becker
Date: Wed Jun 25 2008 - 12:05:47 EST


Hi,

ext Alan Stern wrote:
On Tue, 24 Jun 2008, Stefan Becker wrote:

So the lockup is caused by an already locked hcd_urb_list_lock. Is there a way to see the lock holder? Or any other suggestions how to proceed?

Good work!

Well, I guess I'm just lucky it didn't turn into a heisenbug with all those printk's in the code :-)

The usage in usb_hcd_link_urb_to_ep() appears benign; the code doesn't do anything that might hang while holding the lock. All it does is manipulate a linked list.

Unfortunately I could only run a small test today. I added some simple debugging code for the spinlock usage in hcd.c (see attached diff) and I get the following message at lockup (I tried it twice just to be sure):

HCD URB list locked by usb_hcd_link_urb_to_ep!

As far as I understand the matter this only can happen if usb_hcd_link_urb_to_ep() gets interrupted while holding the spinlock. But according to the contract at the header of the function it should be called with interrupts disabled!

I guess the obvious way forward from here is:

- replace the spin_lock() in the function with the irqsave version

- if that fixes the problem add debugging code to the function and panic with a stack trace when the interrupts aren't disabled one entry (don't know how to detect that yet, any suggestions?) That hopefully identifies the culprit that calls the function with interrupts enabled.

Regards,

Stefan

---
Stefan Becker
E-Mail: Stefan.Becker@xxxxxxxxx
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 09a53e7..20db659 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -45,6 +45,12 @@
#include "hcd.h"
#include "hub.h"

+#ifdef DEBUG
+const char *_urb_list_holder = "NONE";
+#define URB_LIST_HOLDER(n) _urb_list_holder = #n;
+#else
+#define URB_LIST_HOLDER(n)
+#endif

/*-------------------------------------------------------------------------*/

@@ -1002,6 +1008,7 @@ int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
int rc = 0;

spin_lock(&hcd_urb_list_lock);
+ URB_LIST_HOLDER(usb_hcd_link_urb_to_ep)

/* Check that the URB isn't being killed */
if (unlikely(urb->reject)) {
@@ -1107,7 +1114,15 @@ EXPORT_SYMBOL_GPL(usb_hcd_check_unlink_urb);
void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb)
{
/* clear all state linking urb to this dev (and hcd) */
+#ifdef DEBUG
+ if (!spin_trylock(&hcd_urb_list_lock)) {
+ printk(KERN_CRIT "HCD URB list locked by %s!\n", _urb_list_holder);
+#endif
spin_lock(&hcd_urb_list_lock);
+ URB_LIST_HOLDER(usb_hcd_unlink_urb_from_ep)
+#ifdef DEBUG
+ }
+#endif
list_del_init(&urb->urb_list);
spin_unlock(&hcd_urb_list_lock);
}
@@ -1448,6 +1463,7 @@ void usb_hcd_flush_endpoint(struct usb_device *udev,

/* No more submits can occur */
spin_lock_irq(&hcd_urb_list_lock);
+ URB_LIST_HOLDER(usb_hcd_flush_endpoint_INITIAL)
rescan:
list_for_each_entry (urb, &ep->urb_list, urb_list) {
int is_in;
@@ -1482,6 +1498,7 @@ rescan:

/* list contents may have changed */
spin_lock(&hcd_urb_list_lock);
+ URB_LIST_HOLDER(usb_hcd_flush_endpoint_RESCAN)
goto rescan;
}
spin_unlock_irq(&hcd_urb_list_lock);
@@ -1489,6 +1506,7 @@ rescan:
/* Wait until the endpoint queue is completely empty */
while (!list_empty (&ep->urb_list)) {
spin_lock_irq(&hcd_urb_list_lock);
+ URB_LIST_HOLDER(usb_hcd_flush_endpoint_LIST)

/* The list may have changed while we acquired the spinlock */
urb = NULL;