vfile_t (last hint pre 0.2)

Clayton Weaver (cgweav@eskimo.com)
Mon, 10 Nov 1997 00:36:38 -0800 (PST)


Ok, ok, someone might want to use a >2gb virtual file on a cluster of
32-bit pcs ("get these fleas away from me"). I don't really expect that
anyone implementing such an app would do a less efficient job than I with
the code, but a couple of other things to do:

When you open a brand new vfile_t, you could open more than one and maybe
save a disk access or two when crossing filesegment boundaries. How does
this affect seek time?

You can add an int fd file descriptor member to the VSEG struct and
initialize it to 0xffffffff or something when the array is set up when the
app starts, and keep track of file descriptor numbers for open VSEG files
right in the array. This would probably be convenient for the open()
calls, etc, that a vfopen() and similar would use internally. Then you can
do a not-equal-0xffffffff check on the fd to see if the VSEG file is
open() already, instead of a mask and inequality check (lose one mask
operation on the open file check).

When you close a VSEG that has data, you do

VSEG[index].fd = 0xffffffff; /* indicates closed file */

and leave VSEG[index].offset alone. If you truncate the vfile_t down
to where you can completely unlink a VSEG[index].vfseg, then you
unlink it and do

VSEG[index].offset = OPENFL; /* 0x7fffffff or 0x7fffffffffffffff */

to show in memory that this filesegment is not populated with data.

If you are using more than one 2gb file, an offset for app-specific
metadata at the front of VSEG[0].vfseg is a drop in the bucket in terms of
what percentage of your disk space it uses, so why not use it? Keep VFSIZE
(size of the filesegments) and the total voff_t to VSEEK_END (SEEK_END for
last populated VSEG[index].vfseg) in there, and when you reopen the whole
set from a different instance of the application or a forked child or
another thread, you can compute how many files in the array are populated
and what the offset to the first free space is from those saved values,
and not have to call stat() at all to get the sizes. All of the
VSEG[index].offset values for voff_t total/VFSIZE index range (0 indexed)
are VFSIZE.

You can set the whole thing up and go with one little read() from the
first file. And of course the app might want to keep some metadata of its
own in there (ACL's, a chksum on data content per VSEG[index].vfseg that a
modified backup program computes and writes in there whenever it runs,
versioning, whatever).

Have fun.

Regards, Clayton Weaver cgweav@eskimo.com (Seattle)