Re: [PATCH 00/16] DRBD: a block device for HA clusters

From: david
Date: Sun May 03 2009 - 12:16:14 EST


On Sun, 3 May 2009, James Bottomley wrote:

Subject: Re: [PATCH 00/16] DRBD: a block device for HA clusters

On Sun, 2009-05-03 at 08:48 -0700, david@xxxxxxx wrote:
On Sun, 3 May 2009, James Bottomley wrote:

On Sun, 2009-05-03 at 08:22 -0700, david@xxxxxxx wrote:
On Sun, 3 May 2009, James Bottomley wrote:


This corruption situation isn't unique to replication ... any time you
may potentially have allowed both sides to write to a data store, you
get it, that's why it's the job of the HA harness to sort out whether a
split brain happened and what to do about it *first*.

but you can have packets sitting in the network buffers waiting to get to
the remote machine, then once the connection is reestablished those
packets will go out. no remounting needed., just connectivity restored.
(this isn't as bad as if the system tries to re-sync to the temprarily
unavailable drive by itself, but it can still corrupt things)

This is an interesting thought, but not what happens. As soon as the HA
harness stops replication, which it does at the instant failure is
detected, the closure of the socket kills all the in flight network
data.

There is an variant of this problem that occurs with device mapper
queue_if_no_path (on local disks) which does exactly what you say (keeps
unsaved data around in the queue forever), but that's fixed by not using
queue_if_no_path for HA. Maybe that's what you were thinking of?

is there a mechanism in ndb that prevents it from beign mounted more than
once? if so then could have the same protection that DRDB has, if not it
is possible for it to be mounted more than once place and therefor get
corrupted.

That's not really relevant, is it? An ordinary disk doesn't have this
property either. Mediating simultaneous access is the job of the HA
harness. If the device does it for you, fine, the harness can make use
of that (as long as the device gets it right) but all good HA harnesses
sort out the usual case where the device doesn't do it.

with a local disk you can mount it multiple times, write to it from all the mounts, and not have any problems, because all access goes through a common layer.

you would have this sort of problem if you used one partition as part of multiple md arrays, but the md layer itself would detect and prevent this (because it would see both arrays), but again, in a multi-machine situation you don't have the common layer to do the detection.

you can rely on the HA layer to detect and prevent all of this (and apparently there are people doing this, I wasn't aware of it), but I've seen enough problems with every HA implementation I've dealt with over the years (both opensource and commercial), that I would be very uncomfortable depending on this exclusivly. having the disk replication layer detect this adds a significant amount of safety in my eyes.

There are commercial HA products based on md/nbd, so I'd say it's also
hardened for harsher environments

which ones?

SteelEye LifeKeeper. It actually supports both drbd and md/nbd.

thanks for the info.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/