Re: (fwd) ext2 filesystem corruption, tool needed

Hermann Schichl (herman@esi.ac.at)
Thu, 08 Jul 1999 02:19:14 +0200


>
> > BTW, I experienced a huge crash on three different
> > partitions while producing heavy disk activity
> > with kernel 2.2.5-22 (RedHat). Everything started
> > with 'attempt to access beyond end of device'...
>
> Please tell us about your HDD, controller, RAM, CPU. Folks are tracking
> this problem.
>
I did not mention my exact setup because I do not fit into the
2.2.9/10 scheme of filesystem corruption, and by now I am quite
sure that the reason for the corruption is a hardware fault (IDE
controller most probably, maybe RAM -- probably an IDE guru can
tell me whether 'ide lost interrupt' messages could possibly be
caused by bad RAM; the disk works under heavy stress testing w/o
problems in a different machine)

However, I want to make a suggestion:
An error like the famous 'attempt to access beyond end of device'
usually is a Bad Thing (TM), and usually data loss is an immediate
consequence after the first of these messages shows up (it probably
happened even before that). And it tends to get worse if the disk is
being synced afterwards. So most of the time the best choice seems
to be to turn off the machine (or reset it) hereby stopping the kernel
from corrupting the filesystem further. By default ext2 ignores errors
in filesystems; I think that it would be preferable to make remount ro
or panic the default, since it will save people from losing too much
data when they are not big filesystem specialists. (yes I know you can
use tune2fs to change that, but who thinks about that when installing
distribution XY on his server?)

When you are running a production server where data loss is BAD (even if
you manage to backup your data every day), you might prefer that the
machine halts (most of the time just 30 seconds earlier) and keeps corruption
as low as possible. In my case just the recovery of the important data
of the last day using the various ext2 tools took 6 hours (e.g. the first
six superblocks were gone).

Thanks for ext2ed and e2fsck!
Hermann Schichl

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/