Re: Kernel SCM saga..

From: Linus Torvalds
Date: Fri Apr 08 2005 - 19:14:25 EST




On Fri, 8 Apr 2005, Andrea Arcangeli wrote:
>
> We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to
> evaluate how well they scale.

Yes, that makes most sense, I believe. Especially as BKCVS does the
linearization that makes other SCM's _able_ to take the data in the first
place. Few enough SCM's really understand the BK merge model, although the
distributed ones obviously have to do something similar.

> OTOH if your git project already allows storing the data in there,
> that looks nice ;).

I can express the data, and I did a sparse .git archive to prove the
concept. It doesn't even try to save BK-specific details, but as far as I
can tell, my git-conversion did capture all the basic things (ie not just
the actual source tree, but hopefully all the "who did what" parts too).

Of course, my git visualization tools are so horribly crappy that it is
hard to make sure ;)

Also, I suspect that BKCVS actually bothers to get more details out of a
BK tree than I cared about. People have pestered Larry about it, so BKCVS
exports a lot of the nitty-gritty (per-file comments etc) that just
doesn't actually _matter_, but people whine about. Me, I don't care. My
sparse-conversion just took the important parts.

> I don't yet fully understand how the algorithms of the trees are meant
> to work

Well, things like actually merging two git trees is not even something git
tries to do. It leaves that to somebody else - you can see what the
relationship is, and you can see all the data, but as far as I'm
concerned, git is really a "filesystem". It's a way of expression
revisions, but it's not a way of creating them.

> It looks similar to a diff -ur of two hardlinked trees

Yes. You could really think of it that way. It's not really about
hardlinking, but the fact that objects are named by their content does
mean that two objects (regardless of their type) can be seen as
"hardlinked" whenever their contents match.

But the more interesting part is the hierarchical virtual format it has,
ie it is not only hardlinked, but it also has the three different levels
of "views" into those hardlinked objects ("blob", "tree", "revision").

So even though the hash tree looks flat in the _physcal_ filesystem, it
detinitely isn't flat in its own virtual world. It's just flattened to fit
in a normal filesystem ;)

[ There's also a fourth level view in "trust", but that one hasn't been
implemented yet since I think it might as well be done at a higher
level. ]

Btw, the sha1 file format isn't actually designed for "rsync", since rsync
is really a hell of a lot more capable than my format needs. The format is
really designed for something like a offline http grabber, in that you can
just grab files purely by filename (and verify that you got them right by
running sha1sum on the resulting local copy). So think "wget".

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/