Re: SV: PROBLEM: raid5 just dies

From: Neil Brown
Date: Mon Oct 30 2006 - 17:47:10 EST


On Monday October 30, andreas.paulsson@xxxxxxxxxxx wrote:
> >Exactly how are aes-loop and raid5 connected together?
>
> We use 5x300gb drives in a raid5 array, which is then used as a physical
> disk in an lvm volume, with one logical volume. This logical volume is
> then encrypted with "losetup -e aes /dev/loop1 /dev/vg0/lv0", and then
> formatted with ReiserFS.

Thanks.

It could be a hardware problem....
The symptom is that we try to free some memory and a consistency check
tells us that the memory wasn't allocated. So a single bit error in
the address could be the cause. Running memtest86 for a while
wouldn't hurt if you haven't already done that.

You have three layers here: loop over dm over md/raid5.
So if it is a software problem it could be in any of these layers, or
in an interaction between two of them.

1/ how repeatable is this?
2/ how much room have you got to experiment?
Could you remake the array without the loop/aes and see if you can
reproduce the problem?
Could you remake the array without the LVM layer and see if you can
reproduce the problem?

Do you have CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_SLAB set? If not
could you recompile with those set to see if they provide more helpful
information.

I must admit I am somewhat at a loss. I cannot see much room for
problems leading to that particular point in the code that would not
be seen by lots more people than just you.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/