Re: Kernel panic: 2.1.121 with SCSI DAT drive

Philippe Troin (phil@fifi.org)
15 Sep 1998 08:57:03 -0700


Kai M{kisara <makisara@metla.fi> writes:

>
> On 14 Sep 1998, Philippe Troin wrote:
> ...
> > I also have a lot of scsi tape weirdnesses on 2.1.121. Specifically,
> > stupid mt tricks don't work anymore (mt bsfm gives I/O error
> > sometimes). No panics though using vanilla 2.1.121 on AIC78xxx with
> > Archive Python DAT drive.
> >
> The patch did not touch the bsfm command. If you change the line
> '#define DEBUG 0' in linux/drivers/scsi/st.c to '#define DEBUG 1', the
> driver writes to the console/log more information about the problems it
> encounters. Enabling the verbose SCSI messages in kernel configuration
> also helps.

I'll give it a try later on... assuming we can fix the other problem :-)

> > Plus if I try to dump some filesystems, the dump process hanges on
> > down_failed forever:
> >
> > 100 0 367 366 0 0 1028 608 wait4 S p2 0:02 dump
> > 140 0 368 367 0 0 1052 660 unix_data_w S p2 0:00 dump
> > 44 0 369 368 0 0 0 0 do_exit Z p2 0:00 dump
> > 44 0 370 368 0 0 0 0 down_failed DW p2 0:00 (dump)
> > 40 0 371 368 0 0 1028 616 down_failed D p2 0:00 dump
> >
> This sounds like the problem some people encounter but I have never been
> able to reconstruct (I will try again tonight with dump). The process is
> hanging at down() which probably means that the tape driver is waiting for
> the previously sent SCSI command to finish. There are at least the
> following two possibilities:
> 1. There is a bug in the tape driver so that it will never call up() or
> the SCSI interrupt is lost, or

Likely... (this was very reproductible)

> 2. The SCSI bus is hung.

Since I could still access everything else on the bus, not likely.

> The timeout in the tape driver is very long (900 seconds) and one needs a
> lot of patience in order to find out if the system is waiting for a
> timeout or is really hung. You can make the timeout shorter by either
> editing the driver (change ST_TIMEOUT) or using mt (mt sttimeout xxx).
> A timeout of 60 seconds would probably be enough for a DAT.

I was sure the driver was hung since when I discovered the problem,
the dump processes were hung for at least 2 hours...

Note that I was dumping an ext2 fs on a RAID-0 partition split on two
disks on the same SCSI controller the DAT drive is. Makes 3 devices
active at the same time, plus a bunch of drivers involved (md,
aic78xx, sd, st, ext2). Never had any problems with this setup until
2.1.121 though.

If you need help reproducing the bug, email me.

Phil.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/