1.3.88: Lockup _and_ trashed floppy data under load

Jamie Lokier (jamie@rebellion.co.uk)
Tue, 16 Apr 96 00:20 BST


Dear all,

1. I was using 1.3.88 to backup another Linux box onto an HP SCSI tape
drive. With about 300k/s coming over the network and onto the tape,
I tried reading a file of about 200k from a floppy. (You may see the
"bsmtp_by_floppy" transport in the mail headers :-). Unfortunately,
the file was corrupt -- how fortunate that my script does an MD5
signature check. So I threw away the disk (they're always going
wrong...), and went and got the file again. And again it was
corrupt. I little while later I found that each time I read the
file, from the same floppy, I was getting different results. And the
file was not corrupt on the floppy.

Unfortunately, I only did md5 checksums of the file. I didn't look
to see if most of it was a mess, or if only the occasional byte or
whatever was wrong.

The point is that the data was getting corrupted while Linux was
simultaneously busy transferring 300k/s over the LAN and writing to a
SCSI tape drive. This might be related to Stephen Davies' SCSI
disk/tape corruption problems. Perhaps there is a race condition
somewhere.

2. Later, after the backup had finished failing (another issue
entirely) and without having rebooted, I decided to do a network
speed test. Sadly I forget exactly whether it was NFS reads or
`tcpspray', but the system locked up a couple of seconds into it.
(Unhelpful I know -- I'll try to get it to happen again). This is
unusual, as I often test those things without any trouble.

I'm using 1.3.89 now and haven't yet had any trouble. But then I
haven't been doing SCSI tape stuff.

-- Jamie Lokier