Re: Laptop mode causing writes to wrong sectors?

From: Bart Samwel
Date: Wed Nov 16 2005 - 15:07:44 EST


Jan Niehusmann wrote:
let me start by stating that the following is mainly guessed. I may be
completely wrong. Still I think you may be interested in my
observations, and perhaps you already got similar reports?

Nope, no similar reports. But I'm listening. :)

On my laptop, running 2.6.14, I'm observing some strange file- and
filesystem corruptions. First, I thought it may have been caused by an
ext3 bug because the first corruption I did observe happened shortly
after an ext3 journal replay.

I did report this to linux-kernel, but without any helpful response:
http://www.ussg.iu.edu/hypermail/linux/kernel/0511.0/0129.html
(Subject: ext3 corruption: "JBD: no valid journal superblock found")

Quoting your message:

# There are two things I did with the filesystem which may be related to
# this: First, on Oct. 27 I did resize the filesystem (umount, lvextend,
# e2fsck -f, resize2fs, mount). But after that I did several reboots
# without any problems - this is my notebook and I turn it on and off
# several times a day.

First of all, you having resized your fs is a smoking gun, if you ask me. Your fs is dead/dying, and you know you've recently been tinkering with it. It's the most probable cause.

Secondly, I think that your resize sequence is missing an e2fsck -f after resize2fs. Resizing filesystems is risky business, and I've ruined many a filesystem by resizing them. Even when it came clean out of an fsck. I'm also worried that there was apparently _never_ a full fsck after the resize2fs -- seeing as all the subsequent fscks were probably done by journal. That way, any existing problem can stay in existence and slowly "creep" into more and more of your files as you modify them.

But now, I got another hint pointing to a possible cause of this
problem: I found a file - /usr/lib/libatlas.so.3.0 - which was corrupted
by 4k of it being overwritten by a different file, which I recognized. And that file happened to be an uncompressed manual page.

This sounds like your filesystem's block bitmaps are "fscked up". These problems can definitely cause "creeping corruption" when undetected, because (a) new files overwrite existing files only part of the time (especially if your filesystem has a relatively large amount of free space, as it probably does because you just resized it), and (b) you don't actually use most of your files very often, so you usually don't really notice it until it's too late.

Also, AFAIK the journal is simply a special file as far as ext3 is concerned, and perhaps the journal corruption you experienced has to do with that special file's bits being marked free, and the beginning of the journal being overwritten by other data.

DISCLAIMER: I'm biased. I almost lost a filesystem to this exact problem once. It was ext2resize, not resize2fs. But still.

About the laptop mode hypothesis: I think it's just a coincidence. If it's not, then it could be a "sync-time-only" problem (because what laptop mode does before spindown is a sync), but not a specific laptop mode problem -- laptop mode doesn't influence block numbers whatsoever. But if it were a sync problem, we would be seeing a lot more reports of corruption. For now my vote is with the resize. :)

--Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/