Re: linux-2.5.4-pre1 - bitkeeper testing

From: Larry McVoy (
Date: Mon Feb 11 2002 - 10:00:09 EST

On Mon, Feb 11, 2002 at 12:20:57AM -0800, Josh MacDonald wrote:
> > In essence arch isn't that different from RCS in that arch is
> > fundamentally a diff&patch based system with all of the limitations that
> > implies. There are very good reasons that BK, ClearCase, Aide De Camp,
> > and other high end systems, are *not* diff&patch based revision histories,
> > they are weave based. Doing a weave based system is substantially
> > harder but has some nice attributes. Simple things like extracting any
> > version of the file takes constant time, regardless of which version.
> > Annotated listings work correctly in the face of multiple branches and
> > multiple merges. The revision history files are typically smaller,
> > and are much smaller than arch's context diff based patches.
> Larry, I don't think you're saying what you mean to say, and I doubt
> that what you mean to say is even correct. In fact, I've got a
> master's thesis to prove you wrong. I wish you would spend more time
> thinking and less time flaming or actually _do some research_.

Right, of course your masters thesis is much better than the fact that
I've designed and shipped 2 SCM systems, not to mention the 7 years of
research and development, which included reading lots of papers, even
the xdelta papers.

> But as I understand your weave method, its not even linear as a
> function of version size, its a function of _archive size_. The
> archive is the sum of the versions woven together, and that means your
> operation is really O(N+A) = O(A).

The statement was "extracting *any* version of the file takes constant
time, regardless of which version". It's a correct statement and
attempts to twist it into something else won't work. And no diff&patch
based system can hope to achieve the same performance.

> Let's suppose that by "diff&patch" you really mean RCS--generalization
> doesn't work here.

Actually, generalization works just fine. By diff&patch I mean any system
which incrementally builds up the result in a multipass, where the number
of passes == the number of deltas needed to build the final result.

> Your argument supposes that the cost of an RCS
> operation is dominated by the application of an unbounded chain of
> deltas. The cost of that sub-operation is linear as a function of the
> chain length C. RCS has to process a version of length N, an archive
> of size A, and a chain of length C giving a total time complexity of
> O(N+C+A) = O(C+A).
> And you would like us to believe that the weave method is faster
> because of that extra processing cost (i.e., O(A) < O(C+A)). I doubt
> it.

Hmm, aren't you the guy who started out this post accusing me (with
no grounds, I might add) of not doing my homework? And how are we
to take this "I doubt it" statement? Like you didn't go do your
homework, perhaps?

> [etc]
> The cost of these operations is likely dominated by the number of
> blocks transferred to or from the disk, period.

I notice that in your entire discussion, you failed to address my point
about annotated listings. Perhaps you could explain to us how you get an
annotated listing out of arch or xdelta and the performance implications
of doing so.

> So from my point of view, the "weave" method looks pretty inadequate.

I'm sure it does, you are promoting xdelta, so anything else must be bad.
I am a bit curious, do you even know what a weave is? Can you explain
how one works? Surely, after all that flaming, you do know how they
work, right?

Larry McVoy            	 lm at  
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Fri Feb 15 2002 - 21:00:39 EST