Re: Major SATA / EXT3 Issue?

From: Heikki Orsila
Date: Wed Oct 31 2007 - 05:24:29 EST


On Wed, Oct 31, 2007 at 03:25:54AM -0400, Theodore Tso wrote:
> On Tue, Oct 30, 2007 at 03:59:24PM +0200, Heikki Orsila wrote:
> > 3. fsck -p on boot failed
> >
> > (it is very probable not many files were corrupted at this stage)
>
> Maybe...

The system wouldn't have worked as it did, if there were so many broken
files before fsck. For example, the system booted properly before
fsck. After fsck the grub was broken. So, most files that were
corrupted, were unrelated to writes that happened that night. For
example, many files at .deb cache at /var were corrupted, but it has
been a long time since those were touched.

> > 4. I ran fsck.ext3 -y
> >
> > => that corrupted lots and lots of files. This went
> > into a loop, the fsck.ext3 restarted checking over and over again.
>
> It's possible that e2fsck corrupted the files, but they also could
> have been corrupted earlier by the earlier I/O errors.

At least there were no IO errors in log files before that night. The
only IO error that happened put filesystem into RO state.

> There are some
> relatively rare filesystem corruptions which e2fsck doesn't handle as
> gracefully as it should, though, I will admit. I would need to see
> the e2fsck transcript to be sure.

Unfortunately, I do not have it.

> Were there any messages about needing to relocate inode tables by any
> chance?

I don't recall. I recall seeing messages about "too many blocks
inside inode" and "clearing inodes". There were hundreds or
thousands of these messages.

--
Heikki Orsila Barbie's law:
heikki.orsila@xxxxxx "Math is hard, let's go shopping!"
http://www.iki.fi/shd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/