Re: Finding hardlinks

From: Miklos Szeredi
Date: Thu Dec 28 2006 - 14:59:41 EST


> >> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> >> hold water for modern file systems
> >
> > are you really sure?
>
> Well Jan's example was of Coda that uses 128-bit internal file ids.
>
> > and if so, why don't we fix *THAT* instead
>
> Hmm, sometimes you can't fix the world, especially if the filesystem
> is exported over NFS and has a problem with fitting its file IDs uniquely
> into a 64-bit identifier.

Note, it's pretty easy to fit _anything_ into a 64-bit identifier with
the use of a good hash function. The chance of an accidental
collision is infinitesimally small. For a set of

100 files: 0.00000000000003%
1,000,000 files: 0.000003%

And usually (tar, diff, cp -a, etc.) work with a very limited set of
st_ino's. An app that would store a million st_ino values and compare
each new to all the existing ones would be having severe performance
problems and yet _almost never_ come across a false positive.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/