Re: File Corruption Bug.. continued

Chris Adams (cadams@ro.com)
23 Jun 1999 12:56:43 -0500


Once upon a time, Alan Cox <alan@lxorguk.ukuu.org.uk> said:
>I've now been through most of the 2.2.7->2.2.9 diff set discarding stuff
>that seems not to be a viable candiate.
>
>The remaining suspects are:
> o Quota - which has big 2.2.7->2.2.9 changes.
> o The small scsi changes (dubious)
> o A small mm change
>
>There are other candiates - notably
> TCP changes
> interrupt changes
> IRDA
> NFS
>
>The tcp changes ought to have shown up more than this does. The interrupt
>changes dont really seem to explain cross platform stuff. And I doubt most
>people running these tests were running irda. NFS is a possibility.
>
>So - are most people seeing the problems running quotas ? And would someone
>with their brain firmly wrapped around the page cache/vfs verify this change
>that was made is absolutely safe

Well, I don't run quotas on the system I had the FS corruption with (it
is my desktop box at work). I do have SCSI (aic7xxx, which had a driver
update in 2.2.10 that now enables tagged queueing by default IIRC), but
I don't have IRDA or NFS (client or server). I did have NFS compiled in
2.2.9 when I had trouble, but not in 2.2.10 when I had trouble again.

With 2.2.5 on it, I ran a script (taken from the pages about the AMD K6
with Linux) that basically does "make -j4" of the kernel and then
compares the results from run to run. I use this as a good burnin test
of systems these days. I ran this burnin on 2.2.5 for about 18 hours
and it didn't have any trouble.

What I will try to do (probably will be a day or two) is setup a scratch
partition on my system and boot 2.2.10. I'll mount all the other
partitions read-only, and then run the burnin script with the kernel
source on the scratch partition. If it fails (and I think it will after
a while), then at least I'll have a way to reproduce the problem and
test changes.

The first time I had corruption (under 2.2.9) I was doing something
like:

find /usr/local/src/linux -type f -name '*.[ch]' -print | xargs fgrep foo

That would run okay, but then I'd try a different search and I'd start
getting the "attempt to read past end of device" and readdir errors.
Then, when I stopped the find and did "ls /usr/local/src/linux", I got
an empty directory.

IIRC I was doing something similar when the corruption happened again.

I thought it was a hardware problem at first, because I am using an
older drive. I've never seen corruption like this in all the time I
have been using Linux (back to 0.99 days), although I do avoid the known
bad development kernels always. But then I realized that I didn't get
ANY SCSI errors and that when I run 2.2.5 I can't make the same thing
happen no matter how hard I try.

-- 
Chris Adams <cadams@ro.com> - System Administrator
Renaissance Internet Services - IBS Interactive, Inc.
Home: http://ro.com/~cadams - Public key: http://ro.com/~cadams/pubkey.txt
I don't speak for anybody but myself - that's enough trouble.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/