Re: Suggestion: a garbage-collected file system

From: David Madore (madore@clipper.ens.fr)
Date: Tue Jan 11 2000 - 09:56:31 EST


On Tue, Jan 11, 2000 at 10:58:06AM +0000, Sean Hunter wrote:
> Has it occurred to you that using reference counting to decide when to
> free disk resources _is_ garbage collection?

Yes. In fact, that's how I got the idea. But the reasoning is
backward, it's not "Let's make a filesystem that uses full gc rather
than limited reference counting", it's "Let's make a filesystem where
we can have circular structures and other Nasty Things - then of
course we will need a gc on that".

> > In plain English: in the gcfs, hard links are permitted on
> > directories; is is permitted to move or link a directory to a
> > subdirectory of itself (thus creating hard cycles in the filesystem);
>
> mkdir a
> mkdir a/b
> ln a/b c
> mkdir c/d
> mv a c/d
>
> Now '.' contains just c, which contains d/a/b , which contains d/a/b
> etc ad infinitum.

Yes. And c/.. is not . but rather c/d/a (actually, ".." is pretty
meaningless in a gcfs).

> D, A and B all contain themselves. Very Zen. This will send just
> about every unix tool known to man in an infinite loop. Export _that_
> via NFS, if you think you're hard enough.

I've had a look at the Hurd, and they have the same kind of problems:
Unix tools (even when made by GNU) will assume Unix filesystem
semantics, and these are not necessarily verified under the Hurd.

When you run 'find' on a user's directory under the Hurd, you are in
the same situation as running a web crawler on a person's site: they
decide what you will see under that directory/site, and you have no
guarantee that what you will see will obey *any* kind of filesystem
semantics, or finiteness conditions

(see http://www.eleves.ens.fr:8080/cgi-bin/infinity/ for example).

> Now try
>
> mv c/d/a c/d/a/b
>
> ...and a pops up in c/ ! And in c/d/a/b and in c/a/d/a/b/a/d/a/b (I
> think...) and probably in a few (infinite) other places as well.

Actually, it's worse than that: a/ has disappeared from c/d/ because
you just moved it. So now it looks like this: c/ contains d/ and a/,
c/d/ is empty, and c/a/ contains b/ which points back to c/

> Genuinely baroque.

It's more like the other way around: Unix filesystems have very rigid
semantics, which we have gotten accustomed to, but they are by no
means necessary. The Web of URL's has much (much!) looser semantics,
for example.

> > and it is permitted to remove a non-empty directory (this creates a
> > ``detached'' subtree that is no longer reachable from the root inode,
> > and that will be garbage-collected when all the open files and working
> > directories accessing the subtree are closed or removed).
>
> Err, ok
>
> User "sam" is a trusting soul, and gives me permission to read files
> in her "shared" directory. After a while we do more work together,
> and the "shared" directory gets moved into my directory.

Ah, but that is your mistake: why would it get 'moved' into your
directory. If it does, that implies that sam is no longer using it.
Otherwise she would have kept a link to it. I.e. the 'shared'
directory would have been hardlinked in yours, not moved in yours.

A 'move' operation is almost equivalent to link-and-remove, i.e. "mv
foo some/place/" (no matter whether "foo" is a file or a directory) is
almost the same as "ln foo some/place/; rm(dir)? foo" (the 'almost'
part comes from the fact that, if foo is a directory, in the first
case foo/.. will point to some/place/ whereas in the second case it
will still point to .). So you must think of the move as two
operations: first I decide I am interested in sam's 'shared' directory
(note taht I do not need her permission to make that link), then sam
decides that she does not want it any more.

> so my "work" directory contains sam's "shared" directory. Now I can
> remove my "work" directory (because I own it), and because the gcfs
> doesn't mind me deleteing a dir that still contains files, Sam won't
> be able to reference her files via the root filesystem. To cap it
> all, when she quits that long-running emacs session that has them all
> open, the kernel will helpfully garbage-collect them for her.
>
> Great! (Extrapolate to root if you're not too bored already)

It *is* great. Because that is exactly the kind of fs semantics we
*want*: if I am interested in an object, I can make a hard link to
that object, and it will remain living so long as I do not remove the
link, or, rather, so long as there are live links to that object.
This is already the semantics we have for files (replace the "'shared'
directory" by the "'shared' file" throughout in your discussion), and
I am just proposing to extend it to directories.

> > There are many uses (some interesting and some simply very strange)
> > that a gcfs could be put to; but really I'm interested in this because
> > it sounds so cool.
>
> It does, doesn't it!

It really does. To be honest, I think it is completely useless, but
it certainly sounds cool. You have made that all the more obvious.

['it' = 'the VFS']
> Make sure it can handle infinitely nested circular references like the
> above example. That was a trivial case off the top of mind head. I'm
> sure a genuinely devious mind like Al Viro would be able to come up
> with some much nastier ones.

IMHO, the VFS should never have to worry about that sort of things.
It only cares about two things: inodes, which are real references to
objects (resources) in the filesystem, and dentries which name a way
of getting there. It is not supposed to know anything about the state
of the filesystem, and in fact that state can change all the time.

But I do not know whether my Humble Opinion actually reflects the way
the VFS is organized in Linux. :-(

> > A lot of implementation details have to be settled. For example, the
> > treadmill algorithm works by organizing the data blocks (disk blocks
> > in this case) in a doubly linked circular list: where should the
> > points for this doubly linked list be stored?
>
> This is a very good question. Perhaps it would not be too much to
> allocate some precious unswappable kernel memory to put every disk
> block in your filesystem in a circular list? The beauty of the
> circular linked list in this case is, of course that it will have lots
> of smaller circular links within it. Worlds within worlds...

I was thinking of storing the links on disk rather than in memory.
The disadvantage is that the fs is more easily subject to corruption.
The advantage is that the mount operation is much easier (the links to
not have to be all reconstructed). On disk, they could be either
within each block, or in a large table of blocks at the beginning of
the filesystem.

There's still much to be thought about.

Happy hacking,

-- 
     David A. Madore
    (david.madore@ens.fr,
     http://www.eleves.ens.fr:8080/home/madore/ )

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:18 EST