Re: 2.0.36 Lockups [reproducable]

Theodore Y. Ts'o (tytso@mit.edu)
Mon, 23 Nov 1998 15:42:40 -0500


Date: Fri, 20 Nov 1998 12:24:08 -0500
From: Kris Karas <ktk@ktk.bidmc.harvard.edu>

I took note of this, only because I run a high-availability linux
system (24x7) in a medical environment, and just installed 2.0.36.
When I rebooted to 2.0.36, from 2.0.34, fsck was forced on all
filesystems because the time-since-last-check was exceeded (about 160
days since last reboot). And what I found was filesystem corruption.
When the 2.0.34 kernel shut down, it noted some stuck dquots as it
was turning off quotas. Then as 2.0.36 booted and fsck ran, it
reported quite a number of inodes with zero dtime, and a few dozen
block-bitmap differences - usually what I would expect to see if
fscking on an improperly shut down filesys, excepting that no
unexpected crashes had occurred. The system also has a 3c509.
Anyhow, FYI.

Did you updates of shared libraries w/o rebooting? (For example, are
you using Debian, since their update procedure does do this.)
This can cause this type of zero dtime messages, because the running
executables keep the old shared libraries open, so even though they get
deleted by the update procedure, the inode itself can't get freed, so
while the link count is zero, the dtime field is zero.

If this happens, it's not the end of the world; the unused blocks get
cleaned up upon the next fsck, and the only bad side effect is some
blcoks which should have been available for allocations weren't
available.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/