Re: scsi_eh / 1394 bug - -rt7

From: Steven Rostedt
Date: Wed Oct 19 2005 - 02:11:57 EST




On Wed, 19 Oct 2005, Lee Revell wrote:

> On Tue, 2005-10-18 at 23:43 -0400, Lee Revell wrote:
> > On Tue, 2005-10-18 at 14:02 -0700, Mark Knecht wrote:
> > > Hi,
> > > I'm seeing this each time I plug in a 1394 hard drive:
> > >
> > > Attached scsi disk sdc at scsi6, channel 0, id 0, lun 0
> > > ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> > > ieee1394: Node changed: 0-02:1023 -> 0-01:1023
> > > ieee1394: Reconnected to SBP-2 device
> > > ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
> > > ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
> > > prev->state: 2 != TASK_RUNNING??
> > > scsi_eh_6/20286[CPU#0]: BUG in __schedule at kernel/sched.c:3328
> >
> > I hit this exact same bug while at a client site today, with an external
> > USB drive.
>
> And again on my home machine running -rt1 with my digital camera!
>
> Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
> usb 2-1: USB disconnect, address 2
> prev->state: 2 != TASK_RUNNING??
> scsi_eh_0/12648[CPU#0]: BUG in __schedule at kernel/sched.c:3326
> [<c01048b9>] dump_stack+0x19/0x20 (20)
> [<c011e766>] __WARN_ON+0x46/0x80 (12)
> [<c02c0bf7>] __schedule+0x547/0x790 (84)
> [<c012057a>] do_exit+0x26a/0x430 (28)
> [<c010147b>] kernel_thread_helper+0xb/0x10 (1020129312)
>

This is also a problem in the upstream kernel. It's just that RT catches
it! Here's the patch. I'll also write one for the upstream kernel,
although this patch would probably work there as well. But I'll make it
official.

Ingo,

Here's the patch. The problem is similar to the pcmcia bug. It seems
that the loop usually exits in the TASK_INTERRUPTIBLE state.


Lee and Mark,

Can you two try it as well and confirm that you cant get the
bug anymore.

Thanks,


-- Steve


Index: linux-2.6.14-rc4-rt9/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.14-rc4-rt9.orig/drivers/scsi/scsi_error.c 2005-10-19 02:54:49.000000000 -0400
+++ linux-2.6.14-rc4-rt9/drivers/scsi/scsi_error.c 2005-10-19 02:57:06.000000000 -0400
@@ -1645,6 +1645,12 @@
set_current_state(TASK_INTERRUPTIBLE);
}

+ /*
+ * There's a good chance that the loop will exit in the
+ * TASK_INTERRUPTIBLE state.
+ */
+ __set_current_state(TASK_RUNNING);
+
SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d"
" exiting\n",shost->host_no));

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/