Mercurial 0.4b vs git patchbomb benchmark

From: Matt Mackall
Date: Fri Apr 29 2005 - 01:07:18 EST

Next message: Heiko Carstens: "[PATCH] s390: kexec fixes for -mm"
Previous message: Arnd Bergmann: "Re: [PATCH 3/4] ppc64: Add driver for BPA iommu"
In reply to: Andreas Gal: "Re: Mercurial 0.3 vs git benchmarks"
Next in thread: Sean: "Re: Mercurial 0.4b vs git patchbomb benchmark"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Apr 25, 2005 at 07:08:28PM -0700, Linus Torvalds wrote:
>
> To make an interesting benchmark, try applying the first 200 patches in
> the current git kernel archive. Can you do them three per second? THAT is
> the thing you should optimize for, not checking in huge changes.

Ok, I've optimized for it a bit. This is basically:

hg import -p1 -b ../broken-out `cat ../broken-out | grep -v #`

( latest code is at: http://selenic.com/mercurial/ )

My benchmark is to apply all 819 patches from -mm3 to 2.6.12-rc:

hg:

real 3m22.075s
user 1m57.195s
sys 0m14.068s

819/(60+57.195 + 14.068) = 6.239 patches/second user+sys
repository: before 167M after 173M (3.5% growth)

git:

real 2m58.568s
user 1m11.196s
sys 0m50.144s

819/(60+11.196+50.144) = 6.750 patches/second user+sys
repository: before 102M after 154M (51% growth)

Again, pretty close, time-wise. My code is actually spending a fair
amount of time doing delta compression in Python, which accounts for
most of the extra user time. So I think I can optimize most of that
away at some point. Interestingly hg is also using substantially less
system time.

What I'd like to highlight here is that git's repo is growing more
than 10 times faster. 52 megs is twice the size of a full kernel
tarball. And that's going to be the bottleneck for network pull
operations.

The fundamental problem I see with git is that the back-end has no
concept of the relation between files. This data is only present in
change nodes so you've got to potentially traverse all the commits to
reconstruct a file's history. That's gonna be O(top-level changes)
seeks. This introduces a number of problems:

- no way to easily find previous revisions of a file
(being able to see when a particular change was introduced is a
pretty critical feature)
- no way to do bandwidth-efficient delta transfer
- no way to do efficient delta storage
- no way to do merges based on the file's history[1]

Mercurial can grab look up and grab revisions of a file in O(1)
time/seeks. I haven't implemented annotate yet, but it can also be
done O(1) or O(file revisions).

[1] This last one is interesting. If we've got a repository with files A
and B:

M M1 M2

AB
|`-------v M2 clones M
aB AB file A is change in mainline
|`---v AB' file B is changed in M2
| aB / | M1 clones M
| ab/ | M1 changes B
| ab' | M1 merges from M2, changes to B conflict
| | A'B' M2 changes A
`---+--.|
| a'B' M2 merges from mainline, changes to A conflict
`--.|
??? depending on which ancestor we choose, we will have
to redo A hand-merge, B hand-merge, or both
but if we look at the files independently, everything
is fine

--
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Heiko Carstens: "[PATCH] s390: kexec fixes for -mm"
Previous message: Arnd Bergmann: "Re: [PATCH 3/4] ppc64: Add driver for BPA iommu"
In reply to: Andreas Gal: "Re: Mercurial 0.3 vs git benchmarks"
Next in thread: Sean: "Re: Mercurial 0.4b vs git patchbomb benchmark"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]