Re: (reiserfs) reiserfs and knfsd and NFSv4 and volatile file handles

From: Andi Kleen (
Date: Thu Mar 16 2000 - 06:02:39 EST

On Thu, Mar 16, 2000 at 03:10:01AM -0500, Alexander Viro wrote:
> On Wed, 15 Mar 2000, Hans Reiser wrote:
> > Not sure I understand your question, but I'll take a stab at it. Currently the
> > first directory a file is created in stays its packing locality parent for the
> > life of the file. The packing locality parent is the first key component. So,
> > currently, if you know the parent directory id the file was created in you know
> > the first component of the key, and the last component of the key is the offset
> > which you can also know. This leaves the middle of the key unknown.
> Ouch. What happens if you
> mkdir foo
> create foo/bar
> rename foo/bar baz/bar
> rmdir foo
> and ditto for
> mkdir foo
> create foo/bar
> open foo/bar
> unlink foo/bar
> rmdir foo
> - where does your packing locality <whatever> go?

The packing locality is saved in each directory entry and in the
in core inode.

> Another question: how does your write_inode() interact with readdir() and
> friends on parent? Do they need any sort of exclusion? What kind of
> metadata is stored in the directory and is it affected by notify_change()?

No metadata in directories (there are comments in the code that say
that they plan to change that in the future, but I have my doubts
if it is really a good idea)

> One more: suppose you've got
> create foo/bar
> link foo/bar baz/bar
> unlink foo/bar
> create foo/quux unlink baz/bar
> Do the last two of them need any sort of exclusion? VFS _does_ allow the
> last two happen at the same time. You have big lock, but if one of them
> blocks (and you are very unlikely to avoid that) - that's it.
> The reason why I'm asking is that I've done a filesystem with metadata in
> directory and I know what kind of PITA it can become. _And_ I've dealt
> with a monster that keeps metadata in directory trying to support links.
> In case if you wonder - it's AFFS. It's racey as hell and basically I'm
> trying to figure out how did you solve the same set of problems.

reiserfs does not have metadata in directories. The directories are just
like ordinary unix directories being basically a special kind of file with
points to other objects in the tree.

> > The immutability of keys is true merely because we don't yet have a repacker to
> > do re-optimize once a day, and it will change, but for now it is true. The
> > first version of our repacker probably won't change keys, but clearly it should
> > eventually have that functionality.
> And how do you expect NFS to handle that?

He can't. Unix also cannot handle changing inodes (objectids in reiserfs speak).
Oh well, not all future plans work out.

> > >
> > > Let me ask again:
> > > a) is your "32 bit object ID" sufficient to
> > > read/write/chmod/truncate/chown the file? Forget VFS, does the thing
> > > contain enough information to do it in principle (aside of scanning the
> > > whole tree, indeed)?
> > No, we need the directory id plus the objectid (plus the offset) to know what
> > the key is.
> What's the size of key?

Currently 12 bytes (packing locality, object id, offset). Offset is fixed
if you just want to find a file (and implicit in NFS anyways). It is also
fixed if you want to e.g. find the .. of a directory. So the NFS
fh needs to store 8 byte.

> > > b) if you have a directory, is its "32 bit object ID" sufficient to
> > > find ID of its parent?
> >
> > directories are their own parents as far as constructing keys is concerned in
> > plan A which is our next version, key construction for the current version puts
> > directories in the parent of the directory which is an ~10% performance losing
> > notion of a former project member, sigh. If 2.4 had been another month or two
> > out we would have fixed it, such is life.... it will wait for 2.6.
> Wait a minute. Assume that somebody gave you the key of directory. Can you
> find the key of its current parent? Notice that it could very well be
> renamed since the moment of its cretion.

Sure. Just look for (parent, directoryid, DOT_DOT_OFFSET) and read the
full parent key from the directory entry. It is really like Unix in
this regard.

> > > I assume that you _do_ support link(2) - if you don't I would really like
> > > to know what reiserfs_link() does. As in: why it is there and what will
> > > actually happen if you'll start calling it.
> >
> > link works.:-) I think I explained the difficult part above, if you need more
> > detail ask.
> Erm... I need details on the locking you use within your tree. This scheme
> has a huge potential for hidden (and hard to hit unless you know what you
> are hunting for) races, unless you are very accurate with that stuff. One
> more thing that began to scare me - readdir() and offsets in directory.
> How do you do them?

That is what they use all the ugly "has rescheduled" flags for. The entry
also has a "recently deleted" flag that is checked (hidden). When the tree
is rebalanced inbetween they catch that with a relookup (because offset
stays a fixed position). When you insert new entries before the current
filldir offset inbetween you lose (but standard ext2 is no better in that
as far as I can see). In the 2.3 version they use generation counts for
most things, so they can more easily find out if a object changed.
As far as I can see the reiserfs readdir handles all races the ext2
readdir handles.

[Note that I'm no reiserfs hacker, that just came from my own reading
of the source. Maybe it is better you apply the patch and do some
real RTFS ;]


P.S.: The loopback device in 2.3.51 seems to be still very unstable
when accessing block devices. I get weird hangs all the time. It
looks like it is leaking some important inode semaphore or similar,
because all new processes hang and then the remaining processes
start to block one after another as they try to do file system

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:18 EST