Re: [BUG] raid5 crash with 2.4.0-test12 [Was: Linux-2.4.0-test12]

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: Tue Dec 12 2000 - 18:21:56 EST


On Tuesday December 12, jasper@spaans.ds9a.nl wrote:
> On Tue, Dec 12, 2000 at 11:06:07AM -0800, Linus Torvalds wrote:
> >
> > To get better debug output, could you please do something for me?
> >
> > In fs/buffer.c, get rid of "end_buffer_io_bad" completely, and replace all
> > users of it with NULL.
> >
> > Then, in drivers/block/ll_rw_block.c: generic_make_request(), add a test
> > like
> >
> > if (!bh->b_end_io) BUG();
> >
> > to the top of that function.

Could you add this test to the top of md_make_request as well, because
requests to raid5 don't go through generic_make_request.

>
> Strange thing is that it doesn't call BUG() and the trace seems quite
> identical -- this caused me to start looking at the code in
> drivers/md/raid5.c and it seems this null pointer deref is coming from there
> - Neil, do you have some documentation on how this code should work, as
> stripe_head causes some null-pointer-derefs inside my head..

No, no doco, sorry.
I do have a new version of the code that I haven't been brave enough
to submit during a code freeze (whatever that is)... you could try
the raid5 patch under
  http://www.cse.unsw.edu.au/~neilb/patches/linux/2.4.0-test12-pre8

I expect that you will get the same result as I don't (currently)
think the bug is in RAID code, but at least I would get one more
tester for my code....

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Dec 15 2000 - 21:00:25 EST