Re: Kernel SCM saga..

From: Paul Jackson
Date: Sun Apr 10 2005 - 12:44:27 EST


Ingo wrote:
> With default gzip it's 3.3 seconds though,
> and that still compresses it down to 57 MB.

Interesting. I'm surprised how much a bunch of separate, modest sized
files can be compressed.

I'm unclear what matters most here.

Space on disk certainly isn't much of an issue. Even with Andrew Morton
on our side, we still can't grow the kernel as fast as the disk drive
manufacturers can grow disk sizes.

Main memory size of the compressed history matters to Linus and his top
20 lieutenants doing full kernel source patching as a primary mission if
they can't fit the source _history_ in main memory. But those people
are running 1 GByte or more of RAM - so whether it is 95, 57 or 45
MBytes, it fits fine. The rest of us are mostly concerned with whether
a kernel build fits in memory.

Looking at an arch i386 kernel build tree I have at hand, I see the
following disk usage:

102 MBytes - BitKeeper/*
287 MBytes - */SCCS/* (outside of already counted BitKeeper/*)
232 MBytes - checked out source files
94 MBytes - ELF and other build byproducts
---
715 MBytes - Total

Converting from bk to git, I guess this becomes:

97 MBytes - git (zlib)
232 MBytes - checked out source files
94 MBytes - ELF and other build byproducts
---
423 MBytes - Total

Size matters when its a two to one difference, but when we are down to a
10% to 15% difference in the Total, its presentation that matters. The
above numbers tell me that this is not a pure size issue for local disk
or memory usage.

What does matter that I can see:

1) Linus explicitly stated he wanted "a raw zlib compressed blob,
not a gzipped file", to encourage everyone to use the git tools to
access this data. He did not "want people editing repostitory files
by hand." I'm not sure what he gains here - it did annoy me for a
couple hours before I decided fixing my supper was more important.

2) The time to compress will be noticed by users as a delay when
checking in changes (I'm guessing zlib compresses relatively faster).

3) The time to copy compressed data over the internet will be
noticed by users when upgrading kernel versions (gzip can
compress smaller).

4) Decompress times are smaller so don't matter as much.

5) Zlib has a nice library, and is patent free. I don't know about gzip.

6) As you note, zlib has rsync-friendly, recovery-friendly Z_PARTIAL_FLUSH.
I don't know about gzip.

My guess is that Linus finds (2) and (3) to balance each other, and that
(1) decides the point, in favor of zlib. Well, that or a simpler
hypothesis, that he found the nice library (5) convenient, and (1)
sealed the deal, with the other tradeoffs passing through his
subconscious faster than he bothered to verbalize them.

You (Ingo) seem in your second message to be encouraging further
consideration of gzip, for its improved compression.

How will that matter to us, day to day?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxxxxxxx> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/