Re: Possible fix for lockup in drop_caches

From: Richard Kennedy
Date: Mon Dec 24 2007 - 09:13:48 EST



On Sat, 2007-12-22 at 02:06 -0800, Andrew Morton wrote:
> On Mon, 17 Dec 2007 12:13:22 +0000 richard <richard@xxxxxxxxxxxxxxx> wrote:
>
> > fix lockup in when calling drop_caches
> >
> > calling /proc/sys/vm/drop_caches can hang due to a AB/BA lock dependency
> > between j_list_lock and the inode_lock. This patch moves the redirtying of the buffer head out
> > from under the j_list_lock.
> >
> > based on a suggestion by Andrew Morton.
> >
>
> Oh boy. Do we really want to add all this stuff to JBD just for
> drop_caches which is a silly root-only broken-in-22-other-ways thing?

It did end up with a lot of code but I was hoping that not taking 2
locks at the same time would have a positive performance benefit, not
just fix the lockup. But, so far I've not been able to find a benchmark
to show any consistent difference.

However, I'm getting some interesting numbers from iozone that suggest
this patch improves write performance when the buffers are small enough
to fit into memory, _but_ the results are very variable. I'm not sure
why the iozone results are so inconsistent, maybe oprofile will help.
Anyway more testing is needed :)

> Michael, might your convert-inode-lists-to-tree patches eliminate the need
> for taking inode_lock in drop_pagecache_sb()? Probably not, as it uses an
> rbtree. It would have been possible if it was using a radix-tree, I
> suspect..
>
> > -void __journal_unfile_buffer(struct journal_head *jh)
> > +void __journal_unfile_buffer(struct journal_head *jh,
> > + struct buffer_head **dirty_bh)
> > {
> > - __journal_temp_unlink_buffer(jh);
> > + __journal_temp_unlink_buffer(jh, dirty_bh);
> > jh->b_transaction = NULL;
> > }
>
> I suspect the code would end up simpler if __journal_unfile_buffer() were
> to take an additional ref on the bh which it placed at *dirty_bh.
>
> Callers of __journal_unfile_buffer() could then call
>
> void handle_dirty_bh(struct buffer_head *bh)
> {
> if (bh) {
> jbd_mark_buffer_dirty(bh);
> put_bh(bh);
> }
> }

thanks for the suggestion, that looks like a really clean approach, I'll
give it a try.

cheers
Richard


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/