Re: [PATCH 10/11] writeback: splice dirty inode entries to defaultbdi on bdi_destroy()

From: Jan Kara
Date: Thu Sep 17 2009 - 05:34:02 EST


On Wed 16-09-09 20:31:29, Jens Axboe wrote:
> On Wed, Sep 16 2009, Jan Kara wrote:
> > On Wed 16-09-09 15:21:08, Jens Axboe wrote:
> > > On Wed, Sep 16 2009, Jan Kara wrote:
> > > > On Tue 15-09-09 20:16:56, Jens Axboe wrote:
> > > > > We cannot safely ensure that the inodes are all gone at this point
> > > > > in time, and we must not destroy this bdi with inodes having off it.
> > > > ^^^ hanging
> > > >
> > > > > So just splice our entries to the default bdi since that one will
> > > > > always persist.
> > > > BTW: Why can't we make sure all inodes on the BDI are clean when we
> > > > destroy it? Common sence would suggest that we better should be able to do
> > > > it :).
> > > > Maybe it's because most users of private BDI do not call bdi_unregister
> > > > but rather directly bdi_destroy? Is this correct behavior?
> > > Not sure yet, it's on the TODO. This basically works around the problem
> > > for now at least. With dm at least, I'm seeing inodes still hanging off
> > > the bdi after we have done a sync_blockdev(bdev, 1);.
> > Do you really mean sync_blockdev() or fsync_bdev()? Because the first one
> > just synces the blockdev's mapping not the filesystem...
>
> Do we want a fsync_bdev() in __blkdev_put()? It's only doing
No, we cannot call fsync_bdev() there because nothing really guarantees
that there exists any filesystem on the device and that it is setup enough
to handle IO - __blkdev_put() is called e.g. after the filesystem has been
cleaned up in ->put_super(). You can have a look like code in
generic_shutdown_super() looks like. The function is called when user has
no chance of dirtying any more data. In particular sync_filesystem() call
there should write everything to disk. If it does not, it's a bug.
->put_super() can dirty some data again, but only buffers of underlying
blockdev (e.g. when writing bitmaps, superblock etc.). If ->put_super()
method of some filesystem leaves some inodes dirty, it's a bug - we'd see
"VFS: Busy inodes after unmount" message.

> sync_blockdev() on last close, and dm wants to tear down the device at
> that point. So either dm needs to really flush the device when going
> readonly, or we need to strengthen the 'flush on last close'.
Yes, but at the time __blkdev_put() is called, there should be no dirty
inodes as I've argued above. So I still don't quite get how there could be
any :)

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/