Re: ext2 filesystem corruption?!?!??

Theodore Y. Ts'o (tytso@MIT.EDU)
Mon, 31 Mar 1997 15:15:26 -0500


Date: Mon, 31 Mar 1997 10:27:22 +0000 (GMT)
From: Mark Hemment <markhe@nextd.demon.co.uk>

In fs/inode.c, truncate_inode_pages() is called from clear_inode() (to
remove pages from the named-page cache). If there are locked pages in the
inode->i_pages list, then the truncate operation will sleep (the lock
pages probably come from reading-ahead).
While truncate_inode_pages() is sleeping, __iget() is called from another
task for the inode which is being cleared. This task gets the inode,
which is then zero-filled when the truncate operation completes.
Alternatively, when truncate_inode_pages() is sleeping, get_empty_inode()
is called in another context and selects the same inode!
This will most likely happen under heavy load.
Below is a v. simply patch, it removes the inode from the hash and free
lists _before_ truncating (as the named-page cache is indexed via the
in-core inode address we shouldn't have any races in the cache itself).
The best fix is to totally re-write inode.c...

Nice work!

Here's a patch which I developed to attempt to solve some Linux-AFS
cache corruption problems for 2.0.29; the patch should also apply to
2.1.30, which I believe to be safe. It also appears that Thomas
Schoebel-Theuer fs/inode.c rewrite should solve this problem.

There's actually all sorts of really scary things in the current
fs/inode.c. For example, the original lock_inode() routine does the
following:

wait_on_inode(inode)
inode->i_lock = 1

Of course, this runs into big(!!) problems if you have more than one
process waiting on an inode, since both will set inode->i_lock! This
has been fixed as well in my patch....

- Ted

--- fs/inode.c 1997/03/31 17:05:03 1.1
+++ fs/inode.c 1997/03/31 20:02:49
@@ -147,7 +147,8 @@

static inline void lock_inode(struct inode * inode)
{
- wait_on_inode(inode);
+ while (inode->i_lock)
+ wait_on_inode(inode);
inode->i_lock = 1;
}

@@ -173,8 +174,8 @@
{
struct wait_queue * wait;

+ lock_inode(inode);
truncate_inode_pages(inode, 0);
- wait_on_inode(inode);
if (IS_WRITABLE(inode)) {
if (inode->i_sb && inode->i_sb->dq_op)
inode->i_sb->dq_op->drop(inode);
@@ -186,6 +187,7 @@
nr_free_inodes++;
memset(inode,0,sizeof(*inode));
((volatile struct inode *) inode)->i_wait = wait;
+ wake_up(&inode->i_wait);
insert_inode_free(inode);
}

@@ -512,6 +514,7 @@
sleep_on(&inode_wait);
goto repeat;
}
+found_good:
if (best->i_lock) {
wait_on_inode(best);
goto repeat;
@@ -522,7 +525,7 @@
}
if (best->i_count)
goto repeat;
-found_good:
+
clear_inode(best);
best->i_count = 1;
best->i_nlink = 1;