Re: Kernel SCM saga..

From: Paul Jackson
Date: Sat Apr 09 2005 - 10:19:18 EST


Linus wrote:
> you need to reuse the same inode/dev numbers
> (again - I didn't worry about portability, and filesystems where those
> aren't stable are a "don't do that then")

On filesystems that don't have a stable inode number, I use the md5sum
of the full (relative to mount point) pathname as the inode number.

Since these same file systems (not surprisingly) lack hard links as
well, the pathname _is_ essentially the stable inode number.


Off-topic details ...

This is on my backup program, which does a full snapshot of my 90 Gb
system, including some FAT file systems, in 6 or 7 minutes, plus time
proportional to actual changes. I have given up finding a backup
program I can tolerate, and write my own. It stores each md5sum unique
blob exactly once, but uses the same sort of tricks you describe to
detect changes from examining just the stat information so as to avoid
reading every damn byte on the disk. It works with smb, fat, vfat,
ntfs, reiserfs, xfs, ext2/3, ... A single manifest file, in plain
ascii, one file per line, captures a full snapshot, disk-to-disk, every
few hours.

This comment from my backup source explains more:

# Unfortunately, fat, vfat, smb, and ncpfs (Netware) file systems
# do not have unique disk-based persistent inode numbers.
# The kernel constructs transient inode numbers for inodes
# in its cache. But after an umount and re-mount, the inode
# numbers are all different. So we would end up recalculating
# the md5sums of all files in any such file systems.
#
# To avoid this, we keep track of which directories are on such
# file systems, and for files in any such directory, instead
# of using the inode value from stat'ing a file, we use the
# md5sum of its path as a pseudo-inode number. This digest of
# a file's path has improved persistance over it's transiently
# assigned inode number. Fields 5,6,7 (files total, free and
# avail) happen to be zero on file systems (fat, vfat, smb,
# ...) with no real inodes, so we we use this fallback means
# of getting a persistent pseudo-inode if a statvfs() call on
# its directory has fields 5,6,7 summing to zero:
# sum(os.statvfs(dir)[5:8]) == 0
# We include that dir in the fat_directories set in this case.

fat_directories = sets.Set() # set of directory paths on FAT file systems

# The Python statvfs() on Linux is a tad expensive - the
# glibc statvfs(2) code does several system calls, including
# scanning /proc/mounts and stat'ing its entries. We need
# to know for each file whether it is on a "fat" file system
# (see above), but for efficiency we only statvfs at mount
# points, then propagate the file system type from there down.

mountpoints = [m.split()[1] for m in open("/proc/mounts")]



--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxxxxxxx> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/