Re: [rfc] git: combo-blobs

From: Linus Torvalds
Date: Mon Apr 11 2005 - 11:05:33 EST




On Mon, 11 Apr 2005, Ingo Molnar wrote:
>
> if a repository is corrupted then it pretty much needs to be dropped
> anyway.

I disagree. Yes, the thing is designed to be replicated, so most of the
time the easiest thing to do is to just rsync with another copy.

But dammit, I don't want to just depend on that. I wrote "fsck" for a
reason. Right now it only finds errors, which is sufficient if you do the
rsync thing, but I think it's _wrong_ to

- be slower
- be more complex
- be less safe

to save some diskspace.

If you want to save disk-space, the current setup has a great way of doing
that: just drop old history. Exactly because a GIT repo doesn't do the
dependency chain thing, you can do that, and have a minimal GIT
repostiroty that is still perfectly valid (and is basically the size of a
single checked-out tree tree, except it's also compressed).

I don't think many people will do that, considering how cheap disk is, but
the fact is, GIT allows it just fine. "fsck" will complain right now, but
I'm actually going to make the "commit->commit" link be a "weaker" thing,
and have fsck not complain about missing history unless you do the "-v"
thing.

(Right now, for development, I _do_ want fsck to complain about missing
history, but that's a different thing. Right now it's there to make sure I
don't do stupid things, not for "users").

> Also, with a 'replicate the full object on every 8th commit'
> rule the risk would be somewhat mitigated as well.

..but not the complexity.

The fact is, I want to trust this thing. Dammit, one reason I like GIT is
that I can mentally visualize the whole damn tree, and each step is so
_simple_. That's extra important when the object database itself is so
inscrutable - unlike CVS or SCCS or formats like that, it's damn hard to
visualize from looking at a directory listing.

So this really is a very important point for me: I want a demented
chimpanzee to be able to understand the GIT linkages, and I do not want
_any_ partial results anywhere. The recursive tree is already more
complexity than I wanted, but at least that seemed inescapable.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/