Re: interesting FS corruption(?)

Neil Conway (nconway.list@ukaea.org.uk)
Mon, 25 Jan 1999 10:25:12 +0000


Chris Wedgwood wrote:
>
> On Mon, Jan 25, 1999 at 02:43:53AM +0100, Olaf Titz wrote:
>
> > Got this twice in a few weeks now: suddenly, all regular files in a
> > directory disappear. e2fsck finds no errors. No trace in the logs.
> > This is Linux 2.1.131 with the knfsd patches, gcc 2.8.1.
> >
> > Yes, it looks with certainty like a userlevel program gone awry.
> > However, I have unsuccessfully checked the usual suspects in all
> > crontabs and I want to rule out any kernel problem. Has anyone else
> > seen this?
>
> Yes -- once.
>
> I'm not sure exactly what version (I can look back) but something
> recent (ie. 2.1.13x or later).
>
> I was stress-testing the machine with a 'make -j' and seeing how bad
> it would swap to test the effectiveness of some vm changes...
>
> After I go lots of fork failures I control-c'd and stopped the build,
> then a few minutes later after things had gone quiet, I did a make
> clean and tried to rebuild -- only it failed!
>
> When I checked it out, some header files had gone away. I suspected
> disk corruption but e2fsck found nothing. So far, I'm lost as to why
> this occurred... completely lost. My best guess so far (kernel errors
> aside) is that make or some user-land utilities is doing bad things
> when memory allocation fails...
>
> If I could repeat this behavior is might be possible to track this.
>
> <pause>
>
> OK, I reported this:
>
> Subject: 2.2.0-pre4-ac3, heavy load and file go missing?
> Date: Wed, 6 Jan 1999 22:40:34 +1300
>
> If it matters...

Me too: but in my case I ctrl-c'ed the 'make -j' before it gave any fork
failures (unless they just happened so quickly I didn't notice but I
doubt that).

I also reported this, in two threads. The first I started ("Weird: loss
of files during kernel build (!)"), and in the second I was replying to
an existing thread ("2.1.125 filesystem corruption under extreme
load"). These were back last November.

I think the evidence is that we have a serious bug that's hard to
trigger.

Someone with an expendable filesystem :-) needs to try and reproduce
this (I'd love to volunteeer but I'm not in a position to do so for at
least a few weeks).

Neil

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/