Re: Kernel SCM saga..

From: Roman Zippel
Date: Sat Apr 09 2005 - 12:15:21 EST


Hi,

On Fri, 8 Apr 2005, Linus Torvalds wrote:

> Yes. Per-file history is expensive in git, because if the way it is
> indexed. Things are indexed by tree and by changeset, and there are no
> per-file indexes.
>
> You could create per-file _caches_ (*) on top of git if you wanted to make
> it behave more like a real SCM, but yes, it's all definitely optimized for
> the things that _I_ tend to care about, which is the whole-repository
> operations.

Per file history is also expensive for another reason. The basic reason is
that I think that a hash based storage is not the best approach for SCM.
It's lacking locality, so the more it grows the more it has to seek to
collect all the data.
To reduce the space usage you could replace the parent file with a sha1
reference + delta to the new file. This is basically what monotone does
and might cause perfomance problems if you need to restore old versions
(e.g. if you want to annotate a file).

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/