Re: md: md6_raid5 crash 2.6.20

From: Marc Marais
Date: Sun Feb 11 2007 - 19:04:33 EST


On Mon, 12 Feb 2007 09:02:33 +1100, Neil Brown wrote
> On Sunday February 11, marcm@xxxxxxxxxxxxxxxx wrote:
> > Greetings,
> >
> > I've been running md on my server for some time now and a few days ago one of
> > the (3) drives in the raid5 array starting giving read errors. The result was
> > usually system hangs and this was with kernel 2.6.17.13. I upgraded to the
> > latest production 2.6.20 kernel and experienced the same behaviour.
>
> System hangs suggest a problem with the drive controller. However
> this "kernel BUG" is something newly introduced in 2.6.20 which
> should be fixed in 2.6.20.1. Patch is below.
>
> If you still get hangs with this patch installed, then please report
> detail, and probably copy to linux-ide@xxxxxxxxxxxxxxxx
>
> NeilBrown
>
> Fix various bugs with aligned reads in RAID5.
>
> It is possible for raid5 to be sent a bio that is too big
> for an underlying device. So if it is a READ that we
> pass stright down to a device, it will fail and confuse
> RAID5.
>
> So in 'chunk_aligned_read' we check that the bio fits within the
> parameters for the target device and if it doesn't fit, fall back
> on reading through the stripe cache and making lots of one-page
> requests.
>
> Note that this is the earliest time we can check against the device
> because earlier we don't have a lock on the device, so it could
> change underneath us.
>
> Also, the code for handling a retry through the cache when a read
> fails has not been tested and was badly broken. This patch fixes
> that code.
>
> Signed-off-by: Neil Brown <neilb@xxxxxxx>
>

Thanks for the quick response Neil unfortunately the kernel doesn't build with
this patch due to a missing symbol:

WARNING: "blk_recount_segments" [drivers/md/raid456.ko] undefined!

Is that in another file that needs patching or within raid5.c?

Marc

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/