Re: [PATCH 2/7] ext3: call blkdev_issue_flush() on fsync()

From: Chris Mason
Date: Mon Mar 30 2009 - 10:21:58 EST


On Mon, 2009-03-30 at 10:04 -0400, Theodore Tso wrote:
> On Mon, Mar 30, 2009 at 09:11:58PM +0900, Fernando Luis VÃzquez Cao wrote:
> > To ensure that bits are truly on-disk after an fsync or fdatasync, we
> > should force a disk flush explicitly when there is dirty data/metadata
> > and the journal didn't emit a write barrier (either because metadata is
> > not being synched or barriers are disabled).
>
> NACK.
>
> As Eric commented on linux-ext4 (and I think it was Chris Mason
> deserves the credit for originally pointing this out), we don't need
> to call blkdev_issue_flush() after calling sync_inode(). That's
> because sync_inode() eventually (after going through a very deep call
> tree inside fs/fs-writeback.c) __sync_single_inode(), which calls
> write_inode(), which calls the filesystem-specific ->write_inode()
> function, which for both ext3 and ext4, ends up calling
> ext[34]_force_commit. Which, if barriers are enabled, will end up
> issuing a barrier after writing the commit block.
>
> In the code paths that don't end up calling sync_inode() or
> ext4_force_commit(), (i.e., in the fdatasync() case) calling
> block_flush_device is appropriate. But as it stands, this patch (and
> the related one for ext4) will result in multiple unnecessary barrier
> requests being sent to the block layer.
>

I'm not sure we want to stick Fernando with changing how barriers are
done in individual filesystems, his patch is just changing the existing
call points.

The ext34 code is especially tricky because there's no way to tell if a
commit was actually done by sync_inode, so there's no way to know if an
extra flush is really required.

I think we'll be better off if he keeps the existing logic and a
different patch set is made for tuning the ext3 and ext4 code.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/