Re: unmount oops in log_do_checkpoint

From: Nick Piggin
Date: Wed Jan 18 2006 - 00:40:08 EST


On Tue, Jan 17, 2006 at 11:23:53PM +0100, Jan Kara wrote:
> > On Tue, Jan 17, 2006 at 05:32:35PM +0100, Jan Kara wrote:
> > > > On Tue, Jan 17, 2006 at 12:59:45PM +0100, Nick Piggin wrote:
> > > >
> > > > Maybe it is because people haven't been turning on their debugging options,
> > > > tsk tsk ;) It only oopses when DEBUG_SLAB and DEBUG_PAGEALLOC are both
> > > > enabled. And only then when the jbd patch is not reverted. Weird.
> > > Hmm, that's really strange, maybe we have some use-after-free
> > > problem or so... I'll see what I can do :).
> > >
> >
> > Are you able to reproduce? If not I can test patches...
> Hmm, I was not able to reproduce the problem even with those debug
> options set :(. As I'm looking into the code it seems that somebody
> managed to free the transaction but did not clear the
> j_checkpoint_transactions pointer. It's even stranger that it's during
> umount time when there should not be much processes playing with the JBD
> structures on that filesystem.
> Attached is the patch that fixes two minor possible problems I've
> found. Neither of them should be causing your oops but one never knows
> :). Also turn on the JBD debugging in config. Maybe it spits something
> useful. If you still see the same oops, I'll create some debugging
> patch.

This patch does the trick. Survived several reboots now while without
the patch it has oopsed 100% of the time so far. Thanks!

I have also attached a full jbd debug output and oops for the vanilla
2.6.16-rc1 case, just in case that helps.

> BTW: the oops during umount is after some activity on the filesystem
> or you just mount & umount?
>

mount,unmount doesn't seem to trigger it, nor does a bit of filesystem
activity. I haven't tracked down exactly what *does* trigger it, but
booting then rebooting does it every time.

Nick

Attachment: dmesg.bad.gz
Description: application/gunzip