Re: [PATCH 0/7] Per-bdi writeback flusher threads v20

From: Wu Fengguang
Date: Sat Sep 19 2009 - 11:04:14 EST


On Sat, Sep 19, 2009 at 12:26:07PM +0800, Wu Fengguang wrote:
> On Sat, Sep 19, 2009 at 12:00:51PM +0800, Wu Fengguang wrote:
> > On Sat, Sep 19, 2009 at 11:58:35AM +0800, Wu Fengguang wrote:
> > > On Sat, Sep 19, 2009 at 01:52:52AM +0800, Theodore Tso wrote:
> > > > On Fri, Sep 11, 2009 at 10:39:29PM +0800, Wu Fengguang wrote:
> > > > >
> > > > > That would be good. Sorry for the late work. I'll allocate some time
> > > > > in mid next week to help review and benchmark recent writeback works,
> > > > > and hope to get things done in this merge window.
> > > >
> > > > Did you have some chance to get more work done on the your writeback
> > > > patches?
> > >
> > > Sorry for the delay, I'm now testing the patches with commands
> > >
> > > cp /dev/zero /mnt/test/zero0 &
> > > dd if=/dev/zero of=/mnt/test/zero1 &
> > >
> > > and the attached debug patch.
> > >
> > > One problem I found with ext3/4 is, redirty_tail() is called repeatedly
> > > in the traces, which could slow down the inode writeback significantly.
> >
> > FYI, it's this redirty_tail() called in writeback_single_inode():
> >
> > /*
> > * Someone redirtied the inode while were writing back
> > * the pages.
> > */
> > redirty_tail(inode);
>
> Hmm, this looks like an old fashioned problem get blew up by the
> 128MB MAX_WRITEBACK_PAGES.
>
> The inode was redirtied by the busy cp/dd processes. Now it takes much
> more time to sync 128MB, so that a heavy dirtier can easily redirty
> the inode in that time window.
>
> One single invocation of redirty_tail() could hold up the writeback of
> current inode for up to 30 seconds.

It seems that this patch helps. However I'm afraid it's too late to
risk merging such kind of patches now..

Thanks,
Fengguang
---

writeback: don't delay redirtied inode by a fast dirtier

The large 128MB MAX_WRITEBACK_PAGES greatly increases the chance
for an inode to be dirtied by a fast dirtier during the writeback.

We used to call redirty_tail() in this case, which could delay inode
writeback for up to 30s. This becomes unacceptable now even for simple
dd.

But still delay these cases:
- only inode metadata is dirtied (by the fs)
- the writeback_index wrapped around
(to protect against fast dirtier that do repeated overwrites)

CC: Jan Kara <jack@xxxxxxx>
CC: Theodore Ts'o <tytso@xxxxxxx>
CC: Dave Chinner <david@xxxxxxxxxxxxx>
CC: Jens Axboe <jens.axboe@xxxxxxxxxx>
CC: Chris Mason <chris.mason@xxxxxxxxxx>
CC: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
fs/fs-writeback.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

--- linux.orig/fs/fs-writeback.c 2009-09-19 18:09:50.000000000 +0800
+++ linux/fs/fs-writeback.c 2009-09-19 19:00:18.000000000 +0800
@@ -466,6 +466,7 @@ writeback_single_inode(struct inode *ino
long last_file_written;
long nr_to_write;
unsigned dirty;
+ pgoff_t writeback_index;
int ret;

if (!atomic_read(&inode->i_count))
@@ -508,6 +509,7 @@ writeback_single_inode(struct inode *ino
last_file_written = wbc->last_file_written;
wbc->nr_to_write -= last_file_written;
nr_to_write = wbc->nr_to_write;
+ writeback_index = mapping->writeback_index;

ret = do_writepages(mapping, wbc);

@@ -534,10 +536,15 @@ writeback_single_inode(struct inode *ino
spin_lock(&inode_lock);
inode->i_state &= ~I_SYNC;
if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
- if (inode->i_state & I_DIRTY) {
+ if (inode->i_state & I_DIRTY_PAGES) {
/*
- * Someone redirtied the inode while were writing back
- * the pages.
+ * More pages get dirtied by a fast dirtier.
+ */
+ goto select_queue;
+ } else if (inode->i_state & I_DIRTY) {
+ /*
+ * At least XFS will redirty the inode during the
+ * writeback (delalloc) and on io completion (isize).
*/
redirty_tail(inode);
} else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
@@ -546,8 +553,10 @@ writeback_single_inode(struct inode *ino
* sometimes bales out without doing anything.
*/
inode->i_state |= I_DIRTY_PAGES;
+select_queue:
if (wbc->encountered_congestion ||
- wbc->nr_to_write <= 0) {
+ wbc->nr_to_write <= 0 ||
+ writeback_index < mapping->writeback_index) {
/*
* if slice used up, queue for next round;
* otherwise continue this inode after return
@@ -556,6 +565,7 @@ writeback_single_inode(struct inode *ino
} else {
/*
* somehow blocked: retry later
+ * also protect against busy rewrites.
*/
redirty_tail(inode);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/