[PATCH 45/45] btrfs: fix race on syncing the btree inode

From: Wu Fengguang
Date: Wed Oct 07 2009 - 04:04:47 EST


When doing sync, the btree dirty pages refuse to go away for tens of seconds:

# vmmon -d 1 nr_writeback nr_dirty nr_unstable

nr_writeback nr_dirty nr_unstable
46641 23315 0
46641 23380 0
46641 23381 0
26674 43206 0
18963 51006 0
11252 58721 0
3528 66419 0
0 70024 0
0 70024 0
0 70024 0
0 70024 0
0 70024 0
0 70024 0
0 70024 0
0 70024 0

Note that the 70024 pages are under the btree inode's 32MB
no-write-metadata threshold. This is racy because the sync
work has to sleep and retry it forever for data integrity.

The 32MB threshold may also become a problem for background
writeback given a memory tight box. So it may be better to
replace the threshold with some informed writeback tricks.

CC: Chris Mason <chris.mason@xxxxxxxxxx>
Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
fs/btrfs/disk-io.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)

--- linux.orig/fs/btrfs/disk-io.c 2009-10-07 14:31:45.000000000 +0800
+++ linux/fs/btrfs/disk-io.c 2009-10-07 14:32:55.000000000 +0800
@@ -707,22 +707,19 @@ static int btree_writepage(struct page *
static int btree_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
- struct extent_io_tree *tree;
- tree = &BTRFS_I(mapping->host)->io_tree;
- if (wbc->sync_mode == WB_SYNC_NONE) {
- struct btrfs_root *root = BTRFS_I(mapping->host)->root;
- u64 num_dirty;
- unsigned long thresh = 32 * 1024 * 1024;
-
- if (wbc->for_kupdate)
- return 0;
-
- /* this is a bit racy, but that's ok */
- num_dirty = root->fs_info->dirty_metadata_bytes;
- if (num_dirty < thresh)
- return 0;
- }
- return extent_writepages(tree, mapping, btree_get_extent, wbc);
+ struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
+ int ret;
+
+ if (!wbc->for_sync)
+ wbc->nr_segments = 1;
+ ret = extent_writepages(tree, mapping, btree_get_extent, wbc);
+ /*
+ * Fake some some skipped pages, so that VFS won't
+ * try hard on writing this inode.
+ */
+ if (!wbc->for_sync)
+ wbc->pages_skipped++;
+ return ret;
}

static int btree_readpage(struct file *file, struct page *page)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/