/dev/fd0 causes Kernel PANIC, stalled system, permanent D-wait

Dan St.Andre' (grillon@m3.interserv.com)
Wed, 09 Apr 1997 00:35:45 -0500


I wrote the following after a brief thread on redhat-list@redhat.com about
use of /dev/fd0 causing various system faults. We are still running the
RedHat 3.0.3 kit with the v1.x kernel and drivers shipped with RH 3.0.3.
I've read the fd0 driver code (whew its old and dusty according to the
remarks) and it looks okay at that level. That makes me suspect an
interaction between the FS and FD0. We see no troubles on HDA with ext2
file systems.

QUESTION: Is there klog output we can easily enable to watch for low level
I/O errors from the diskette? To see why we enter D-wait ... and just stay
there?
Could it be that fd0 and ext2 are fine but some other trouble keeps us in
D-wait?

We use a lot of sneaker net from a Linux server to other Linux boxes
that only have exactly one PPP connection. Application parameter files are
loaded from the server/PC of origin to the target machine using diskettes --
one per application. This means a lot of mount/umount cycles on a diskette.
We cannot seem to get auto-fsck to work either thru tune2fs commands or
using options to mkfs.ext2.
Steve Coile suggested that we try "errors=continue" in fstab. We did,
but the diskette superblock already says that. We added "check=strict" too.
The diskette drives that we have the most trouble with are TEAC units --
slim jobs that usually go into a notebook computer. They are "green drives"
and we initially suspected some idle wind-down/run-up hand shake. TEAC
says, "... the run under DOS ..." and offered little further help.

----------------------
Folks,
More "evidence": Make a new file system on a known good diskette.
Mount it with any set of options you like. Write to the diskette
and "... fill that sucker plum full..."

1) The process goes into D wait
2) Sometimes you get a Kernel panic
3) "-t msdos" is worse than "-t ext2" file systems

We wrote a little tester that does mkfs --> mount --> fill --> umount
in a forever loop. The first tester was a script that worked fine 95+% of
the time. The second tester was a C program that wrote 15, 100K arrays to
the diskette [Hmm, 15*100,000 is bigger than 1,440,000] Usually the write
fails, no problem. Sometimes we get D-wait or Kernel Panic.

BTW. It must be a default but all of our ext2 diskettes show
"errors=continue" as the default according to tune2fs. Could it be that
mount is not respecting
the superblock on diskettes? [I could not get automatic fsck, after time or
mount events, to work so I'm suspicious.]

Can someone help me with this and make me a hero?
--- Dan 0;-D
=====================================================================
"In a dragon fight, often times, the bleachers get scorched."
=====================================================================
Daniel M. St.Andre'
Software Development Manager
ASOMA Instruments, Inc.
11675 Jollyville Road
Austin, TX 78759 USA
voice: 512.258.6608 x202
fax: 512.331.9123
ofc email: 76245.125@compuserve.com
home email: grillon@interserv.com
==================================================================