Re: Suggestion: a garbage-collected file system

From: Sean Hunter (sean@uncarved.co.uk)
Date: Tue Jan 11 2000 - 05:58:06 EST


On Mon, Jan 10, 2000 at 04:26:40PM +0100, David Madore wrote:
> Hi everybody.
>
> There's something I would like to try implementing, and that is a
> garbage-collected file system for Linux.
>
> Such a filesystem (``gcfs'') would not use reference-counting (as is
> done on inodes) to decide when to free disk resources, but rather, it
> would garbage-collect the directory structure.

Has it occurred to you that using reference counting to decide when to
free disk resources _is_ garbage collection?
 
> The advantage of this is that in such a filesystem we can remove some
> of the irritating limitations that Unix filesystems traditionally are
> subject to because of the requirements imposed by the
> reference-counting strategy.
>
> In plain English: in the gcfs, hard links are permitted on
> directories; is is permitted to move or link a directory to a
> subdirectory of itself (thus creating hard cycles in the filesystem);

mkdir a
mkdir a/b
ln a/b c
mkdir c/d
mv a c/d

Now '.' contains just c, which contains d/a/b , which contains d/a/b
etc ad infinitum.

Nice.

D, A and B all contain themselves. Very Zen. This will send just
about every unix tool known to man in an infinite loop. Export _that_
via NFS, if you think you're hard enough.

Now try

mv c/d/a c/d/a/b

...and a pops up in c/ ! And in c/d/a/b and in c/a/d/a/b/a/d/a/b (I
think...) and probably in a few (infinite) other places as well.

Cool.

Still into this? Try
 
ln c/d/a/b/d/a/b e
mkdir e/f

(Exercise for the reader: where should "f" appear?)

mv c e/f

Genuinely baroque.

> and it is permitted to remove a non-empty directory (this creates a
> ``detached'' subtree that is no longer reachable from the root inode,
> and that will be garbage-collected when all the open files and working
> directories accessing the subtree are closed or removed).

Err, ok

User "sam" is a trusting soul, and gives me permission to read files
in her "shared" directory. After a while we do more work together,
and the "shared" directory gets moved into my directory.

so my "work" directory contains sam's "shared" directory. Now I can
remove my "work" directory (because I own it), and because the gcfs
doesn't mind me deleteing a dir that still contains files, Sam won't
be able to reference her files via the root filesystem. To cap it
all, when she quits that long-running emacs session that has them all
open, the kernel will helpfully garbage-collect them for her.

Great! (Extrapolate to root if you're not too bored already)

>
> There are many uses (some interesting and some simply very strange)
> that a gcfs could be put to; but really I'm interested in this because
> it sounds so cool.

It does, doesn't it!

> There are essentially two classes of problems with this gcfs. Number
> one, determine precisely what the garbage-collecting algorithm should
> be. Number two, find how to adapt the Unix filesystem semantics to
> garbage-collected data, and vice versa.
>
> For number one, the garbage-collection algorithm I was thinking of
> using is Baker's treadmill algorithm (with a write barrier), which is
> documented in the book ``Garbage Collection: Algorithms for Automatic
> Dynamic Memory Management'' by Jones and Lins (published by Wiley) in
> section 8.8; I will describe it in more detail if I find some people
> are interested in my proposal. It is a non-moving concurrent
> garbage-collector.

Make sure it can handle infinitely nested circular references like the
above example. That was a trivial case off the top of mind head. I'm
sure a genuinely devious mind like Al Viro would be able to come up
with some much nastier ones.

>
> A lot of implementation details have to be settled. For example, the
> treadmill algorithm works by organizing the data blocks (disk blocks
> in this case) in a doubly linked circular list: where should the
> points for this doubly linked list be stored?

This is a very good question. Perhaps it would not be too much to
allocate some precious unswappable kernel memory to put every disk
block in your filesystem in a circular list? The beauty of the
circular linked list in this case is, of course that it will have lots
of smaller circular links within it. Worlds within worlds...

Sean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:17 EST