Re: Transparent compression in the FS
From: Jeff Garzik
Date: Thu Oct 16 2003 - 18:32:56 EST
jw schultz wrote:
Because each hash algorithm has different pathologies
multiple hashes are generally better than one but their
effective aggregate bit count is less than the sum of the
separate bit counts.
I was coming to this conclusion too... still, it's safer simply to
handle collisions.
The idea of this sort of block level hashing to allow
sharing of identical blocks seems attractive but i wouldn't
trust any design that did not accept as given that there
would be false positives. This means that a write would
have to not only hash the block but then if there is a
collision do a compare of the raw data. Then you have to
yep
add the overhead of having lists of blocks that match a hash
value and reference counts for each block itself. Further,
every block write would then need to include, at minimum,
relinking facilities on the hash lists and hash entry
allocations plus the inode block list being updated. Finally
Consider a simple key-value map, where "$hash $n" is the key and the
value is the block of data. Only three operations need occur:
* if hash exists (highly unlikely!), read and compare w/ raw data
* write new block to disk
* add single datum to key-value index
Inode block lists would need to be updated regardless of any collision;
that would be a standard and required part of any VFS interaction. And
the internal workings of the key-value index (think Berkeley DB) are
static, regardless of any collision.
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/