Re: [PATCH] bdi_sync_writeback should WB_SYNC_NONE first

From: Andrew Morton
Date: Sun Sep 27 2009 - 13:10:52 EST


On Sun, 27 Sep 2009 18:55:14 +0200 Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:

> > I wasn't referring to this patch actually. The code as it stands in
> > Linus's tree right now attempts to write back up to 2^63 pages...
>
> I agree, it could make the fs sync take a looong time. This is not a new
> issue, though.

It _should_ be a new issue. The old code would estimate the number of
dirty pages up-front and would then add a +50% fudge factor, so if we
started the sync with 1GB dirty memory, we write back a max of 1.5GB.

However that might have got broken.

void sync_inodes_sb(struct super_block *sb, int wait)
{
struct writeback_control wbc = {
.sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_NONE,
.range_start = 0,
.range_end = LLONG_MAX,
};

if (!wait) {
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);

wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused);
} else
wbc.nr_to_write = LONG_MAX; /* doesn't actually matter */

sync_sb_inodes(sb, &wbc);
}

a) the +50% isn't there in 2.6.31

b) the wait=true case appears to be vulnerable to livelock in 2.6.31.

whodidthat

38f21977663126fef53f5585e7f1653d8ebe55c4 did that back in January.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/