Re: Proposal: restrict link(2)

bofh@snoopy.virtual.net.au
Tue, 31 Dec 96 13:05:13 +1000


>SC> Well, on THAT system we had ~400 accounts on a 300M paritition. SC> Even
>with 1M quota, one was more likely to run out of physical SC> disk space.

>The problem seems to be that hard links are useful to conserve disk space.
>Well how about implementing copy-on-write semantics in the file system? That
>is, cp foo bar should not duplicate any file data, but only a little metadata.
>That should be transparent to user programs, but it would have the distinct
>advantage that everyone knows the semantics of cp, while it seems people are
>getting pretty confused about ln.

Firstly to do copy-on-write in the file system you need to have cp
functionality in the filesystem and in the kernel VFS interface. I think that
this is a good thing for a number of other reasons and that it should be done.
OS/2 has an API call DosCopy() to tell the kernel to copy a file for the
application. This frees the application programmer from bothering about EAs
etc, and saves a lot of unnecessary OS calls (instead of a large number of
DosRead() and DosWrite() calls you have a single API call to copy all data). I
believe that we would be better off if we had this functioality in Linux - even
if only cp used it (and we would have to write an xcp program that's similar to
the DOS and OS/2 XCOPY to copy sub-directories - `tar -cf - | (cd target ; tar
-xvf -)` won't be able to take advantage of copy-on-write).
Now if we implement copy-on-write in a file system we would need to store
metadata to inform the FS driver of the state of the file. The easiest way of
doing this would be to do something similar to a hard link - but this would
require copying the whole file if any writing is to be done. This would still
save a considerable amount of disk space because in my experience I've noticed
that either a file is not written to after copying or it most of it is written.
I think that it's fairly uncommon to copy a large file and then write to only a
small part of it - however I'm sure that there are people out there who run
different programs to me and would find that this is not true of their systems.
At the moment I am designing a new file system. The major feature of my
file system is that a number of hard drives can be combined into a "partition
set" to make the filing system on, and that partitions may be added to and
removed from the set dynamically at run-time without interfering with running
processes. To do this I need to make it possible to find out which file uses a
particular block (so I can update the Inode when I move the data from the block
onto another partition). I would like to add the copy-on-write facility to the
file system but I think that the overhead of having an index from each block to
a variable number of Inodes might add too much overhead (both in terms of
disk-space for the indexes and in computation time when the file system is
being used) to be worth-while.

For some brief info on the file system I'm designing have a look at
http://www.virtual.net.au/~rjc/enh-fs.html.

Russell Coker