Re: [PATCH 00/16] DRBD: a block device for HA clusters

From: James Bottomley
Date: Tue May 05 2009 - 17:54:18 EST


On Tue, 2009-05-05 at 23:45 +0200, Philipp Reisner wrote:
> > I also think you're not quite looking at the important case: if you
> > think about it, the real necessity for the ordered domain is the
> > network, not so much the actual secondary server. The reason is that
> > it's very hard to find a failure case where the write order on the
> > secondary from the network tap to disk actually matters (as long as the
> > flight into the network tap was in order). The standard failure is of
> > the primary, not the secondary, so the network stream stops and so does
> > the secondary writing: as long as we guarantee to stop at a consistent
> > point in flight, everything works. If the secondary fails while the
> > primary is still up, that's just a standard replay to bring the
> > secondary back into replication, so the issue doesn't arise there
> > either.
>
> A common power failure is possible. We aim for an HA system, we can
> not ignore a possible failure scenario. No user will buy: Well in most
> scenarios we do it correctly, in the unlikely case of a common power
> failure, and you loose your former primary at the same time, you might
> have a secondary with the last write but not that one write before!
>
> Correctness before efficiency!

Well, you have to agree that during a resync from the activity log,
which plays up the primary disk from one end to another, the secondary
is completely corrupt if a primary failure occurs before the resync
completes. That's something that's triggered by a network outage, and
so is a far more common event than cascading dual failures. It's all
really a question of where you focus your effort to eliminate the corner
cases.

> But I will now stop this discussion now. Proving that DRBD does some
> details better than the md/nbd approch gets pointless, when we agreed
> that DRBD can get merged as a driver. We will focus on the necessary
> code cleanups.

I agree. Also HA is full of corner cases like this and opinion is
endlessly divided over which corner cases are more important than which
others.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/