[PATCH] mm: fix s390 BUG by __set_page_dirty_no_writeback on swap

From: Hugh Dickins
Date: Mon Apr 23 2012 - 14:15:21 EST

Mel reports a BUG_ON(slot == NULL) in radix_tree_tag_set() on s390 3.0.13:
called from __set_page_dirty_nobuffers() when page_remove_rmap() tries to
transfer dirty flag from s390 storage key to struct page and radix_tree.

That would be because of reclaim's shrink_page_list() calling add_to_swap()
on this page at the same time: first PageSwapCache is set (causing
page_mapping(page) to appear as &swapper_space), then page->private set,
then tree_lock taken, then page inserted into radix_tree - so there's
an interval before taking the lock when the radix_tree slot is empty.

We could fix this by moving __add_to_swap_cache()'s spin_lock_irq up
before the SetPageSwapCache. But a better fix is simply to do what's
five years overdue: Ken Chen introduced __set_page_dirty_no_writeback()
(if !PageDirty TestSetPageDirty) for tmpfs to skip all the radix_tree
overhead, and swap is just the same - it ignores the radix_tree tag,
and does not participate in dirty page accounting, so should be using
__set_page_dirty_no_writeback() too.

s390 testing now confirms that this does indeed fix the problem.

Reported-by: Mel Gorman <mgorman@xxxxxxx>
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
Acked-by: Mel Gorman <mgorman@xxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Ken Chen <kenchen@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx

mm/swap_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- 3.4-git/mm/swap_state.c 2012-03-31 17:42:26.949729938 -0700
+++ linux/mm/swap_state.c 2012-04-17 15:34:05.732086663 -0700
@@ -26,7 +26,7 @@
static const struct address_space_operations swap_aops = {
.writepage = swap_writepage,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = __set_page_dirty_no_writeback,
.migratepage = migrate_page,

