Possible Bug with MD multipath and raid1 on top

From: Oktay Akbal (oktay.akbal@s-tec.de)
Date: Sat Sep 14 2002 - 13:33:07 EST


Hello,

I found a very strange effect when using a raid1 on top of multipathing
with Kernel 2.4.18 (Suse-version of it) with a 2-Port qlogic HBA
connecting two arrays.

The raidtab used to set this up is:

raiddev /dev/md0
raid-level multipath
nr-raid-disks 1
nr-spare-disks 1
chunk-size 32

device /dev/sda1
raid-disk 0

device /dev/sdc1
spare-disk 1

raiddev /dev/md1
raid-level multipath
nr-raid-disks 1
nr-spare-disks 1
chunk-size 32

device /dev/sdb1
raid-disk 0

device /dev/sdd1
spare-disk 1

raiddev /dev/md2
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
chunk-size 32

device /dev/md0
raid-disk 0

device /dev/md1
raid-disk 1

As you see, one port from the hba "sees" sda and sdb, the second port
sdc and sdd.
When I now pull out one of the cables two disks are missing and the
multipath driver correctly uses the second path to the disks and
continues to work. After plugging out the second cable all drives
are marked as failed (mdstat), but the raid1 (md2) is still reported
as functional with one device (md0) missing.
(Sorry do not have the output at hand but md2 was reported [_U], while
sda-sdd were marked [F]).

All Processes using the raid1-device get stuck and this situation
does not recover. Even some simple process testing the disk-access
got stuck (I think ps showed state L<D).

Even if I'm quite sure that this is a bug, how should I test disk access
without ending in "uninterruptible sleep" ?

Thanks

Oktay Akbal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:36 EST