Re: Transparent compression in the FS

From: Jörn Engel
Date: Fri Oct 17 2003 - 09:32:59 EST


On Thu, 16 October 2003 17:02:30 -0400, Jeff Garzik wrote:
>
> I'm curious if anyone has done any work on using multiple different
> checksums? For example, the cost of checksumming a single block with
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing
> each checksum (instead of just one sha1 sum), may be faster than reading
> the block off of disk to compare it with the incoming block. OTOH,
> there is still a mathematical possibility (however-more-remote) of a
> collission...

Would be interesting. The underlying assumptions of compare-by-hash
are a) a cryptologically strong hash and b) a sufficient hash space.
Since noone has proven a) yet for any hash, it is necessary to store
multiple hashes and just ignore one of them as soon as that particular
hash is proven to be weak.

As a side-effect, you could search for hash collisions this way. A
new block that has the same md5 hash as some other, but a new sha1 and
crc32 hash tells you a lot. :)

Jörn

--
This above all: to thine own self be true.
-- Shakespeare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/