Re: clustered MD

From: Goldwyn Rodrigues
Date: Wed Jun 10 2015 - 12:08:55 EST



On 06/10/2015 10:01 AM, David Teigland wrote:
On Tue, Jun 09, 2015 at 10:33:08PM -0500, Goldwyn Rodrigues wrote:
some real world utility to warrant the potential maintenance effort.

We do have a valid real world utility. It is to provide
high-availability of RAID1 storage over the cluster. The
distributed locking is required only during cases of error and
superblock updates and is not required during normal operations,
which makes it fast enough for usual case scenarios.

That's the theory, how much evidence do you have of that in practice?

We wanted to develop a solution which is lock free (or atleast
minimum) for the most common/frequent usage scenario. Also, we
compared it with iozone on top of ocfs2 to find that it is very
close to local device performance numbers. we compared it with cLVM
mirroring to find it better as well. However, in the future we would
want to use it with with other RAID (10?) scenarios which is missing
now.

OK, but that's the second time you've missed the question I asked about
examples of real world usage. Given the early stage of development, I'm
supposing there is none, which also implies it's too early for merging.


I thought I answered that:
To use a software RAID1 across multiple nodes of a cluster. Let me explain in more words..

In a cluster with multiple nodes with a shared storage, such as a SAN. The shared device becomes a single point of failure. If the device loses power, you will lose everything. A solution proposed is to use software RAID, say with two SAN switches with different devices and create a RAID1 on it. So if you lose power on one switch or one of the device is fails the other is still available. Once you get the other switch/device back up, it would resync the devices.

What are the doubts you have about it?

Before I begin reviewing the implementation, I'd like to better understand
what it is about the existing raid1 that doesn't work correctly for what
you'd like to do with it, i.e. I don't know what the problem is.

David Lang has already responded: The idea is to use a RAID device
(currently only level 1 mirroring is supported) with multiple nodes
of the cluster.

That doesn't come close to answering the question: exactly how do you want
to use raid1 (I have no idea from the statements you've made)

Using software RAID1 on a cluster with shared devices.


, and exactly
what breaks when you use raid1 in that way? Once we've established the
technical problem, then I can fairly evaluate your solution for it.


Data consistency breaks. If node 1 is writing to the RAID1 device, you have to make sure the data between the two RAID devices is consistent. With software raid, this is performed with bitmaps. The DLM is used to maintain data consistency.

Device failure can be partial. Say, only node 1 sees that one of the device has failed (link break). You need to "tell" other nodes not to use the device and that the array is degraded.

In case of node failure, the blocks of the failed nodes must be synced before the cluster can continue operation.

Does that explain the situation?


--
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/