Re: Transparent compression in the FS

From: jw schultz
Date: Thu Oct 16 2003 - 18:13:34 EST


On Thu, Oct 16, 2003 at 05:41:34PM +0100, P@xxxxxxxxxxxxxx wrote:
> Andrea Arcangeli wrote:
> >Hi Jeff,
> >
> >On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> >
> >>Josh and others should take a look at Plan9's venti file storage method
> >>-- archival storage is a series of unordered blocks, all of which are
> >>indexed by the sha1 hash of their contents. This magically coalesces
> >>all duplicate blocks by its very nature, including the loooooong runs of
> >>zeroes that you'll find in many filesystems. I bet savings on "all
> >>bytes in this block are zero" are worth a bunch right there.
> >
> >I had a few ideas on the above.
> >
> >if the zero blocks are the problem, there's a tool called zum that nukes
> >them and replaces them with holes. I use it sometime, example:
>
> Yes post processing with this tool is useful.
> Also note gnu cp (--sparse) inserts holes
> in files also.

As does cpio, rsync and many other archiving and data
transfer utilities.

> I thought a bit about this also and thought
> that in general the overhead of instant block/file
> duplicate merging is not worth it. Periodic
> post processing with a merging tool is
> much more efficient IMHO. Of course this is
> now only possible at the file level, but this
> is all that generally useful I think. I guess
> it's appropriate to plug my merging tool here :-)
> http://www.pixelbeat.org/fslint

What might be worth considering is internal NUL detection.
Build the block-of-zeros detection into the filesystem
write resulting in automatic creation of sparse files.
This could even work with extent based filesystems where
using hashes to identify shared blocks would not.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@xxxxxxxxxx

Remember Cernan and Schmitt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/