Re: BUG: ext3 hang in transaction commit

From: David Chinner
Date: Thu Apr 03 2008 - 21:27:53 EST


On Thu, Apr 03, 2008 at 12:07:42PM +0200, Jan Kara wrote:
> Hi,
>
> > ia32 XFS QA machine, ext3 root on a raw partition. 2.6.25-rc3.
> >
> > kjournald hung journal_commit_transaction():
> >
> > Stack traceback for pid 2046
> > 0xf6e0b350 2046 2 0 0 D 0xf6e0b550 kjournald
> > esp eip Function (args)
> > 0xf68d5e70 0xc04c20b2 schedule+0x51e
> > 0xf68d5ec0 0xc04c2394 io_schedule+0x1d
> > 0xf68d5ecc 0xc0179b64 sync_buffer+0x33 (invalid)
> > 0xf68d5ed4 0xc04c2570 __wait_on_bit+0x36 (0xc2000b78, 0xf68d5f00, 0xc0179b31, 0x2)
> > 0xf68d5ef0 0xc04c25ef out_of_line_wait_on_bit+0x58 (0xde89bc18, 0x2, 0xc0179b31, 0x2)
> > 0xf68d5f2c 0xc0179adf __wait_on_buffer+0x19
> > 0xf68d5f38 0xc01cecba journal_commit_transaction+0x40b (0xf6d94a00)
> > 0xf68d5fa0 0xc01d180a kjournald+0xa4 (0xf6d94a00)
> > 0xf68d5fd4 0xc01301f1 kthread+0x3b (invalid)
>
> I suppose this is wait_on_buffer() in line 444 in fs/jbd/commit.c, isn't it?

No idea. I haven't looked at the code....

> > We're waiting on the last page/buffer in the file, and it doesn't appear
> > to be under writeback....
> We wait for write of ordered-data to finish. Which seems to never
> happen. Page isn't under writeback, but that just means we submitted the
> buffer from the commit code (that doesn't change the page state).
> Anyway, the cause is that either due to some bug IO never finished and

Yes, I certainly beleive that is possible. We see it often enough with
XFS....

> so buffer never got unlocked, or we somewhere locked the buffer and
> forgot to unlock it (but I've checked all the relevant places and think
> they are correct). The traces of all the processes seem harmless - I see
> no place trace where we are holding a buffer lock.
> If you happen to hit this again, please let me know and I'll look into
> it further...

Will do.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/