Re: [PATCH 6/6] kill-the-BKL/reiserfs: release the write lock onflush_commit_list()

From: Chris Mason
Date: Fri May 01 2009 - 09:55:00 EST


On Fri, 2009-05-01 at 15:29 +0200, Ingo Molnar wrote:
> * Chris Mason <chris.mason@xxxxxxxxxx> wrote:
>
> > On Fri, 2009-05-01 at 15:13 +0200, Frederic Weisbecker wrote:
> > > On Fri, May 01, 2009 at 07:42:47AM +0200, Ingo Molnar wrote:
> > > >
> > > > * Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> > > >
> > > > > flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
> > > > > ll_rw_block() might sleep, and the bkl was released at this point. Then
> > > > > we can also relax the write lock at this point.
> > > > >
> > > > > [ Impact: release the reiserfs write lock when it is not needed ]
> > > > >
> > > > > Cc: Jeff Mahoney <jeffm@xxxxxxxx>
> > > > > Cc: Chris Mason <chris.mason@xxxxxxxxxx>
> > > > > Cc: Alexander Beregalov <a.beregalov@xxxxxxxxx>
> > > > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > > > > ---
> > > > > fs/reiserfs/journal.c | 7 +++++--
> > > > > 1 files changed, 5 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
> > > > > index 373d080..b1ebd5a 100644
> > > > > --- a/fs/reiserfs/journal.c
> > > > > +++ b/fs/reiserfs/journal.c
> > > > > @@ -1120,8 +1120,11 @@ static int flush_commit_list(struct super_block *s,
> > > > > SB_ONDISK_JOURNAL_SIZE(s);
> > > > > tbh = journal_find_get_block(s, bn);
> > > > > if (tbh) {
> > > > > - if (buffer_dirty(tbh))
> > > > > - ll_rw_block(WRITE, 1, &tbh) ;
> > > > > + if (buffer_dirty(tbh)) {
> > > > > + reiserfs_write_unlock(s);
> > > > > + ll_rw_block(WRITE, 1, &tbh);
> > > > > + reiserfs_write_lock(s);
> > > > > + }
> > > > > put_bh(tbh) ;
> > > > > }
> > > > > }
> > > >
> > > > there's 7 other instances of ll_rw_block():
> > > >
> > > > fs/reiserfs/journal.c- spin_unlock(lock);
> > > > fs/reiserfs/journal.c: ll_rw_block(WRITE, 1, &bh);
> > > > fs/reiserfs/journal.c- spin_lock(lock);
> > > > --
> > > > fs/reiserfs/journal.c- reiserfs_write_unlock(s);
> > > > fs/reiserfs/journal.c: ll_rw_block(WRITE, 1, &tbh);
> > > > fs/reiserfs/journal.c- reiserfs_write_lock(s);
> > > > --
> > > > fs/reiserfs/journal.c- /* read in the log blocks, memcpy to the corresponding real block */
> > > > fs/reiserfs/journal.c: ll_rw_block(READ, get_desc_trans_len(desc), log_blocks);
> > > > fs/reiserfs/journal.c- for (i = 0; i < get_desc_trans_len(desc); i++) {
> > > > --
> > > > fs/reiserfs/journal.c- set_buffer_dirty(real_blocks[i]);
> > > > fs/reiserfs/journal.c: ll_rw_block(SWRITE, 1, real_blocks + i);
> > > > fs/reiserfs/journal.c- }
> > > > --
> > > > fs/reiserfs/journal.c- }
> > > > fs/reiserfs/journal.c: ll_rw_block(READ, j, bhlist);
> > > > fs/reiserfs/journal.c- for (i = 1; i < j; i++)
> > > > --
> > > > fs/reiserfs/stree.c- if (!buffer_uptodate(bh[j]))
> > > > fs/reiserfs/stree.c: ll_rw_block(READA, 1, bh + j);
> > > > fs/reiserfs/stree.c- brelse(bh[j]);
> > > > --
> > > > fs/reiserfs/stree.c- reada_blocks, reada_count);
> > > > fs/reiserfs/stree.c: ll_rw_block(READ, 1, &bh);
> > > > fs/reiserfs/stree.c- reiserfs_write_unlock(sb);
> > > > --
> > > > fs/reiserfs/super.c-{
> > > > fs/reiserfs/super.c: ll_rw_block(READ, 1, &(SB_BUFFER_WITH_SB(s)));
> > > > fs/reiserfs/super.c- reiserfs_write_unlock(s);
> > > >
> > > > in particular the second stree.c one and the super.c has a
> > > > write-unlock straight before the lock-drop.
> > > >
> > > > I think the stree.c unlock could be moved to before the
> > > > ll_rw_block() call straight away.
> > >
> > >
> > > Indeed.
> > >
> > >
> > > >
> > > > The super.c one needs more care: first put &(SB_BUFFER_WITH_SB(s))
> > > > into a local variable, then unlock the wite-lock, then call
> > > > ll_rw_block(). (This is important because &(SB_BUFFER_WITH_SB(s)) is
> > > > global filesystem state that has to be read with the lock held.)
> > >
> > >
> > > Indeed &(SB_BUFFER_WITH_SB(s)) is a pointer to blocks that
> > > reflect the state of the filesystem but it was already not
> > > safe on the old code.
> > >
> >
> > SB_BUFFER_WITH_SB isn't going to change. It gets set once during
> > mount and will return the same buffer from then on.
>
> Are we holding a permanent reference to it after the first read? If
> yes then any bread done on an uptodate bh will return immediately
> without scheduling

Yes, it is pinned the whole time the FS is mounted. It looks to me like
the reread_meta_blocks() function can go away. The log replay should be
updating the same caches that the super block bh is in.

But, this isn't very performance critical stuff, it only happens in
reiserfs_fill_super(). I'd leave it.

-chris




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/