Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: NeilBrown
Date: Sat May 16 2015 - 23:04:01 EST


On Sat, 16 May 2015 06:46:26 +0100 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:

> On Sat, May 16, 2015 at 02:45:27PM +1000, NeilBrown wrote:
>
> > Yes, I've looked lately :-)
> > I think that all of RCU-walk, and probably some of REF-walk should happen
> > before the filesystem gets to see anything.
> > But once you hit a non-positive dentry or the parent of the target name, I'd
> > rather hand over the the FS.
>
> ... and be ready to get it back when the sucker runs into a symlink. Unless
> you want to handle _those_ in NFS somehow (including an absolute one starting
> with /sys/, etc.).

Certain - when a symlink or mountpoint is found, the filesystem stops.
mountpoints should be rarely hit because the path to a mountpoint will
usually be stable enough for RCU-walk to find it....
Thinks: I wonder what happens when a mount-on NFS directory is deleted on the
server...

automountpoints would be handled completely by the filesystem. It would
mount something and then return saying "Look, I found a mount point - you
wanna handle for me?".


>
> > NFSv4 has the ability to look up multiple components in a single LOOKUP call.
> > VFS doesn't give it a chance to try because it wants to go step-by-step, and
> > wants each entry in the cache to have an inode etc.
>
> Do tell, how do we deal with .. afterwards if we leave the intermediate ones
> without inodes? We _could_ feed multi-component requests to filesystems
> (and NFSv4 isn't the first one to handle that - 9p had been there a lot
> earlier), but then you get to
> * populate all of them with inodes
> * be damn careful to avoid multiple dentries for the same directory
> inode

NFS directories already need to be revalidated occasionally. Having a dentry
in "unknown" state just means a revalidation is that much more likely.

Suppose I cd into a directory, then rename the directory on the server. What
happens? What should happen?
I could make a case that the NFS client should lookup ".." on the server and
rebuild the path upwards.

There is a (to me) really key point here.
Local filesystems use the dcache for correctness. It prevents concurrent
directory renames from creating loops and it ensure that only one file of a
given name exists in each directory.

Remote filesystems don't use it for correctness. For them it is simply an
optimisation. So getting upset about directories with multiple dentries, or
directories that aren't connected to the root is very important for local
filesystems, and largely irrelevant for network filesystems.

A local filesystem needs the cache to remain consistent with storage. A
network filesystem cannot possible ensure that the cache is consistent with
storage, and just need to be able to notice the more offensive
inconsistencies reasonably quickly, and repair them.

> Look, creating those suckers isn't the worst part; you need to be ready for
> e.g. mount(2) or pathname resolution playing with the ones you'd created.
> It's not fs-private data structure; pathname resolution might very well span
> many filesystem types.

Any partname lookup which touched these dentries would call d_revalidate()
(or similar) which could get the inode etc if it was really needed.

>
> Worse, you get to deal with several multi-component requests jumping into
> fs at the same place. With responses arriving a bit afterwards, and guess
> what? Those requests happen to share bits and pieces of prefixes. Oh,
> and one of them is a rename. Dealing with just the final components isn't
> a problem; you'll need to deal with directory tree in all its fscking glory.
> In a way that wouldn't be in too incestous relationship with the pathwalking
> logics in VFS and, by that proxy, such in all other fs types.
>
> In particular, "unknown" for intermediate nodes is a recipe for really
> nasty mess. If the path can rejoin the known universe several components
> later... <shudder>
>
> Dealing with multi-component lookups isn't impossible and might be a good
> idea, but only if all intermediates are populated. What information does
> NFSv4 multi-component lookup give you? 9p one gives an array of FIDs,
> one per component, and that is best used as multi-component revalidate
> on hot dcache...

If was remembering RFC3010 in which a LOOKUP had a "pathname4" which was an
array of "component4". It could just return the filehandle and attributes of
the final target.
RFC3530 and later revised that so "LOOKUP" gets a "component4". That just
means that it is easy to get the attributes if you want them.

I'm not really saying that multiple component lookups are a good idea, or
that doing the lookup and not getting the intermediate attributes is a
sensible approach. What I'm really pointing out is that the current dcache
imposes a particular model very strongly on filesystems and I'm far from
convinced that that is a good idea.

NeilBrown

Attachment: pgpYNVN4jHGAt.pgp
Description: OpenPGP digital signature