[PATCH] mm, memory-failure: Fix spinlock vs mutex order

From: Peter Zijlstra
Date: Fri Jun 17 2011 - 15:49:33 EST


On Fri, 2011-06-17 at 20:32 +0200, Peter Zijlstra wrote:
> Aside from the THP thing there's a user in memory-failure.c, which looks
> to be broken as it is because its calling things under tasklist_lock
> which isn't preemptible, but it looks like we can simply swap the
> tasklist_lock vs page_lock_anon_vma.
>

I thought about maybe using rcu, but then thought the thing is probably
wanting to exclude new tasks as it wants to kill all mm users.

---
Subject: mm, memory-failure: Fix spinlock vs mutex order

We cannot take a mutex while holding a spinlock, so flip the order as
its documented to be random.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
---
mm/memory-failure.c | 21 ++++++---------------
mm/rmap.c | 5 ++---
2 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index eac0ba5..740c4f5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -391,10 +391,11 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
struct task_struct *tsk;
struct anon_vma *av;

- read_lock(&tasklist_lock);
av = page_lock_anon_vma(page);
if (av == NULL) /* Not actually mapped anymore */
- goto out;
+ return;
+
+ read_lock(&tasklist_lock);
for_each_process (tsk) {
struct anon_vma_chain *vmac;

@@ -408,9 +409,8 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
add_to_kill(tsk, page, vma, to_kill, tkc);
}
}
- page_unlock_anon_vma(av);
-out:
read_unlock(&tasklist_lock);
+ page_unlock_anon_vma(av);
}

/*
@@ -424,17 +424,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
struct prio_tree_iter iter;
struct address_space *mapping = page->mapping;

- /*
- * A note on the locking order between the two locks.
- * We don't rely on this particular order.
- * If you have some other code that needs a different order
- * feel free to switch them around. Or add a reverse link
- * from mm_struct to task_struct, then this could be all
- * done without taking tasklist_lock and looping over all tasks.
- */
-
- read_lock(&tasklist_lock);
mutex_lock(&mapping->i_mmap_mutex);
+ read_lock(&tasklist_lock);
for_each_process(tsk) {
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);

@@ -454,8 +445,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
add_to_kill(tsk, page, vma, to_kill, tkc);
}
}
- mutex_unlock(&mapping->i_mmap_mutex);
read_unlock(&tasklist_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}

/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 0eb463e..5e51855 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -38,9 +38,8 @@
* in arch-dependent flush_dcache_mmap_lock,
* within inode_wb_list_lock in __sync_single_inode)
*
- * (code doesn't rely on that order so it could be switched around)
- * ->tasklist_lock
- * anon_vma->mutex (memory_failure, collect_procs_anon)
+ * anon_vma->mutex,mapping->i_mutex (memory_failure, collect_procs_anon)
+ * ->tasklist_lock
* pte map lock
*/



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/