Re: Transparent compression in the FS

From: Ingo Oeser
Date: Fri Oct 17 2003 - 08:20:00 EST


On Thursday 16 October 2003 19:29, Larry McVoy wrote:
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > Josh and others should take a look at Plan9's venti file storage method
> > -- archival storage is a series of unordered blocks, all of which are
> > indexed by the sha1 hash of their contents. This magically coalesces
> > all duplicate blocks by its very nature, including the loooooong runs of
> > zeroes that you'll find in many filesystems. I bet savings on "all
> > bytes in this block are zero" are worth a bunch right there.
>
> The only problem with this is that you can get false positives. Val Hensen
> recently wrote a paper about this. It's really unlikely that you get false
> positives but it can happen and it has happened in the field.

Which is solvable by expanding your index with an enumerator for identical keys
but different content and have an overflow table exactly for this.

Handling hash table overflow is normal for hashing (even for
crypto-hashes). Why did they forget that? That's 2nd semester CS!

Regards

Ingo Oeser


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/