[PATCH] kill-the-bkl/reiserfs: fix reiserfs lock tocpu_add_remove_lock dependency

From: Frederic Weisbecker
Date: Mon Oct 05 2009 - 14:13:27 EST


On Tue, Sep 29, 2009 at 02:22:42PM +0400, Alexander Beregalov wrote:
> 2009/9/29 Frederic Weisbecker <fweisbec@xxxxxxxxx>:
> > Yeah indeed, it's about the same kind of thing.
> > Could you please test the following patch?
>
> Thanks, the warning has gone away.


Thanks a lot Alexander, your tests and reports are very precious!

I've pushed the commit below, as usual it can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
reiserfs/kill-bkl

---
From: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Date: Mon, 5 Oct 2009 16:31:37 +0200
Subject: [PATCH] kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency

While creating the reiserfs workqueue during the journal
initialization, we are holding the reiserfs lock, but
create_workqueue() also holds the cpu_add_remove_lock, creating
then the following dependency:

- reiserfs lock -> cpu_add_remove_lock

But we also have the following existing dependencies:

- mm->mmap_sem -> reiserfs lock
- cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex

The merged dependency chain then becomes:

- mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
cpu_hotplug.lock -> slub_lock -> sysfs_mutex

But when we fill a dir entry in sysfs_readir(), we are holding the
sysfs_mutex and we also might fault while copying the directory entry
to the user, leading to the following dependency:

- sysfs_mutex -> mm->mmap_sem

The end result is then a lock inversion between sysfs_mutex and
mm->mmap_sem, as reported in the following lockdep warning:

[ INFO: possible circular locking dependency detected ]
2.6.31-07095-g25a3912 #4
-------------------------------------------------------
udevadm/790 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0

but task is already holding lock:
(sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (sysfs_mutex){+.+.+.}:
[...]

-> #4 (slub_lock){+++++.}:
[...]

-> #3 (cpu_hotplug.lock){+.+.+.}:
[...]

-> #2 (cpu_add_remove_lock){+.+.+.}:
[...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
[...]

-> #0 (&mm->mmap_sem){++++++}:
[...]

This can be fixed by relaxing the reiserfs lock while creating the
workqueue.
This is fine to relax the lock here, we just keep it around to pass
through reiserfs lock checks and for paranoid reasons.

Reported-by: Alexander Beregalov <a.beregalov@xxxxxxxxx>
Tested-by: Alexander Beregalov <a.beregalov@xxxxxxxxx>
Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Cc: Jeff Mahoney <jeffm@xxxxxxxx>
Cc: Chris Mason <chris.mason@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: Alexander Beregalov <a.beregalov@xxxxxxxxx>
Cc: Laurent Riffard <laurent.riffard@xxxxxxx>
---
fs/reiserfs/journal.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 04e3c42..2f8a7e7 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2933,8 +2933,11 @@ int journal_init(struct super_block *sb, const char *j_dev_name,
}

reiserfs_mounted_fs_count++;
- if (reiserfs_mounted_fs_count <= 1)
+ if (reiserfs_mounted_fs_count <= 1) {
+ reiserfs_write_unlock(sb);
commit_wq = create_workqueue("reiserfs");
+ reiserfs_write_lock(sb);
+ }

INIT_DELAYED_WORK(&journal->j_work, flush_async_commits);
journal->j_work_sb = sb;
--
1.6.2.3



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/