Problem in "prune_icache"

From: HongChao Zhang
Date: Mon Mar 30 2009 - 05:52:47 EST


Hi

I'am from Lustre, which is a product of SUN Mirocsystem to implement
Scaled Distributed FileSystem, and we encounter a deadlock problem
in prune_icache, the detailed is,

during truncating a file, a new update in current journal transaction
will be created, but it found memory in low level during processing,
then it call try_to_free_pages to free some pages, which finially call
shrink_icache_memory/prune_icache to free cache memory occupied by inodes.
Note: prune_icache will get and hold "iprune_mutex" during its whole pruning work.

but at the same time, kswapd have called shrink_icache_memory/prune_icache with
"iprune_mutex" locked, which found some inodes to dispose and call
clear_inode/DQUOT_DROP/fs-specific-quota-drop-op(say "ldiskfs_dquot_drop" in our case)
to drop dquot, and this fs-specific-quota-drop-op can call journal_start to
start a new update, but it found the buffers in current transaction is up to
j_max_transaction_buffers, so it wake up kjournald to commit the transaction.
so kjournald will call journal_commit_transaction to commit the transcation,
which set the state of the transaction as T_LOCKED then check whether there are
still pending updates for the committing transaction, and it found there is a
pending update(started in truncating operation, see above), so it will wait
the update to complete, BUT the update won't be completed for it can't get the
"iprune_mutex" hold by kswapd, so the deadlock is triggered.

please see attachment for the possible patch to fixup this problem.


Regards
Hongchao


___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/--- fs/inode.c.orig 2009-01-24 03:28:57.000000000 +0800
+++ fs/inode.c 2009-01-24 03:30:18.000000000 +0800
@@ -418,7 +418,9 @@ static void prune_icache(int nr_to_scan)
int nr_scanned;
unsigned long reap = 0;

- mutex_lock(&iprune_mutex);
+ if (!mutex_trylock(&iprune_mutex))
+ return;
+
spin_lock(&inode_lock);
for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
struct inode *inode;