Re: Transparent compression in the FS

From: Andrea Arcangeli
Date: Thu Oct 16 2003 - 11:29:54 EST


Hi Jeff,

On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> Josh and others should take a look at Plan9's venti file storage method
> -- archival storage is a series of unordered blocks, all of which are
> indexed by the sha1 hash of their contents. This magically coalesces
> all duplicate blocks by its very nature, including the loooooong runs of
> zeroes that you'll find in many filesystems. I bet savings on "all
> bytes in this block are zero" are worth a bunch right there.

I had a few ideas on the above.

if the zero blocks are the problem, there's a tool called zum that nukes
them and replaces them with holes. I use it sometime, example:

andrea@velociraptor:~> dd if=/dev/zero of=zero bs=1M count=100
100+0 records in
100+0 records out
andrea@velociraptor:~> ls -ls zero
102504 -rw-r--r-- 1 andrea andrea 104857600 2003-10-16 18:24 zero
andrea@velociraptor:~> ~/bin/i686/zum zero
zero [820032K] [1 link]
andrea@velociraptor:~> ls -ls zero
0 -rw-r--r-- 1 andrea andrea 104857600 2003-10-16 18:24 zero
andrea@velociraptor:~>

if you can't find it ask and I'll send it by email (it's GPL btw).

the hash to the data is interesting, but 1) you lose the zerocopy
behaviour for the I/O, it's like doing a checksum for all the data going to
disk that you normally would never do (except for the tiny files in reiserfs
with tail packing enabled, but that's not bulk I/O), 2) I wonder how much data
is really duplicate besides the "zero" holes trivially fixable in userspace
(modulo bzImage or similar where I'm unsure if the fs code in the bootloader
can handle holes ;).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/