Re: [BUG] raid5 crash with 2.4.0-test12 [Was: Linux-2.4.0-test12]

From: Linus Torvalds (torvalds@transmeta.com)
Date: Tue Dec 12 2000 - 14:06:07 EST


On Tue, 12 Dec 2000, Jasper Spaans wrote:

> On Mon, Dec 11, 2000 at 06:52:55PM -0800, Linus Torvalds wrote:
> >
> > Ok, there it is. Noticeable changes from pre8 are mainly (a) new tq list
> > compile fixes and (b) the NetApp snapshot thing.
>
> > - final:
> > - Neil Brown: raid and md cleanups
>
> Hmm, while doing some not-so-heavy things with a mysql-db on a raid5-device
> this kernel Oopsed on me; ksymoops output [which went through klogd,
> shouldn't matter that much, klogd was using the right System.map]:
>
> Dec 12 14:04:50 spaans kernel: invalid operand: 0000
> Dec 12 14:04:50 spaans kernel: CPU: 1
> Dec 12 14:04:50 spaans kernel: EIP: 0010:[end_buffer_io_bad+85/92]
>
> Dec 12 14:04:50 spaans kernel: Call Trace:
> [raid5_end_buffer_io+68/128]
> [complete_stripe+151/272]
> [handle_stripe+331/1092]
> [raid5d+173/260]
> [md_thread+299/508]

Looks like somebody didn't initialize the "b_end_io" pointer - the code
defaults to it being "end_buffer_io_bad" (which oopses unconditionally on
purpose exactly to find places where it wasn't initialized).

And it obviously looks like it's the raid5 code that does it.

It _looks_ like the raid5 code does a "generic_make_request()" without
setting b_end_io anywhere, but I don't know the raid5 code well enough.

To get better debug output, could you please do something for me?

In fs/buffer.c, get rid of "end_buffer_io_bad" completely, and replace all
users of it with NULL.

Then, in drivers/block/ll_rw_block.c: generic_make_request(), add a test
like

        if (!bh->b_end_io) BUG();

to the top of that function.

You'll still get a oops, but the difference is that you'll get the oops
when the request is queued, rather than when the requst is finished, which
will make it easier to figure out what the thing is that leads up to this.

In the meantime I'm sure Neil can figure out where in the raid5 code we
don't initialize the buffer head properly even without that, but it's
worth doing the above anyway.

        Thanks,

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Dec 15 2000 - 21:00:25 EST