More filesystem corruption under 2.4.1-pre8 and SW Raid5

From: Holger Kiehl (Holger.Kiehl@dwd.de)
Date: Fri Jan 19 2001 - 02:47:23 EST


Hello

Trying to find a quick way to reproduce the filesystem corruption
I reported earlier, I have written a short program that simply creates
a certain number of files in a given directory. Now if I start this
program 9 times each creating 50000 files (each 2048 Bytes) in 9
different directories and then delete these files again I always get
filesystem corruption.

I admit that creating 50000 files in one directory is not something
very common, but in my other test there are simply to many process
creating and deleting files and took too long to reproduce. My
assumption is that something goes wrong somewhere as soon as a
certain number files have been created.

The test where done on two different machines both SMP, SW Raid 5
and ext2 filesystem. Under 2.4.1-pre3 and pre8 I always get filesystem
corruption. This does NOT happen under 2.2.18.

I don't know if this is due to a problem in the Raid 5, ext2 filesystem
or in the kernel. Also, I do not currently have a system with 2.4.x
without raid5. For this reason I have attached two files (one C program
and a script) with the code that corrupts my filesystem. To run it you
need to issue the following commands:

    cc -o fsd fsd.c
    mkdir testdir
    cp fsd start_fsd testdir
    cd testdir
    chmod 755 start_fsd
    ./start_fsd

    now you need to wait 3 or 4 hours and you should see some
    ext2 errors in your syslog.

WARNING: This corrupts you filesystem really badly! Sometimes
         only the files in the testdir are effected. However, I
         had cases where other files where also effected. The
         system sometimes behaves very strangely after the test,
         programs that always have worked just crash. Reconstruction
         with fsck does not always work properly, sometimes there are
         very strange files scattered over the whole filesystem
         afterwards. So be warned, do this on a test filesystem and boot
         the machine after the test!

Another thing I notice is that the responsiveness of the machine
decreases dramatically as the test progresses until it is nearly
useless. After the test is done everything is back to normal.
The same behavior was observed under 2.2.18.

Holger





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jan 23 2001 - 21:00:20 EST