Re: File corruption: IDE hd problem or kernel problem ?

Riccardo Facchetti (fizban@tin.it)
Wed, 27 Aug 1997 10:22:17 +0200 (MET DST)


On Tue, 26 Aug 1997, mlord wrote:

> Riccardo Facchetti wrote:
> >
> > I have downloaded the new release of communicator and at the end of
> > download (ftp) I've found that the tarball was corrupted (why ??? ... it
> > was an ftp connection !!!). Then I have splitted the tarball in 100k
> > pieces to find what piece was corrupted and then I've re-downloaded just
> > the corrupted one (thanks to an account of mine on an university machine).
> > I have extracted the tarball and I was installing it when the system
> > blocked. Not freezed because alt-sysreq was working. I have had the time
>
> We cannot be certain about the cause of the original corruption
> (short file, maybe -- download failed without telling you?).

Of course. The fact is that the file was as long as the original, so the
file was fully downloaded, but at the end of the operation, it was
corrupted.
There is another thing to say. During all my pppd sessions, I receive a
lot of "TCPv4 bad checksum from ..." messages. Of course all this packets
are discarded, but I would like to know why I receive so many checksum
errors.

> ....
> > After the reboot and fsck, the communicator tarball was corrupted again
>
> Mmm.. filesystem corruption could have caused the second failure,
> or maybe something flakey in the I/O cabling or PIO settings.. ??
> ....

Yes. This seems to be an option (I'm PIO 3 because PIO 4 caused me
problems. I have a tower so I suspect the cables are too long) but ...

> > Then I tried to
> >
> > tar cfv /u/tarball.tar /usr/src/linux/
> > cp /u/tarball.tar /u2/
> > diff /u/tarball.tar /u2/tarball.tar
> >
> > and no diffs.
>
> Note: The diff is probably just diff'ing from memory,
> regardless of what actually got written to the drive.

No, I don't think so.

I have only 32 Mb of RAM and the kernel was compiled, with all .o, .a and
intermediate kernel images. Result of 'du -s': 42933, much more than my
total RAM.

>
> > dd if=/dev/zero of=/u2/a1 bs=1048576 count=40
> > cp /u2/a1 /u/
> > diff /u/a1 /u2/a1
> >
> > and no diffs.
>
> Ditto, unless you have less than 40Meg memory.
> For this type of test, reboot before doing the diff
> if you want to *really* test things.

No. I was careful to create a file bigger than my real memory.

>
> > kernel hdx msgs:
> >
> > hda: QUANTUM LPS270A, ATA DISK drive
> > hdb: ST51080A, ATA DISK drive
> > hdc: ST32132A, ATA DISK drive
> > hdd: LTN106A, ATAPI CDROM drive
>
> If it still corrupts when copied to disk, try copying it around
> a few times, and then reboot before doing a "diff".

Okay. Yesterday evening, I retried the installation of communicator. I
have found that the corruption problem was highly reproducible (on my
machine, of course).

I just have to

'split -b 100k communicator-v402b7-export.x86-unknown-linux2.0.tar.gz'

re-download the corrupted chunks (xbt and xcn) and then, when it was time
to 'cat x* > communicator.tar.gz', the xbt and/or xcn was corrupted again.
No matter if I 'cp xbt xbt.saved' before catting, the xbt.saved in the
long run become corrupted too. I have saved all the correct files in a
different partition. The corruption reached these files too (only xbt and
xcn).

Ah, and just for the courios, to decide if a file was corrupted or not
I've used md5sum.

I'm starting to think that my linux box don't really like this version of
communicator :)

Ciao,
Riccardo.