Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: Al Viro
Date: Sun May 17 2015 - 20:10:21 EST


On Mon, May 18, 2015 at 09:39:07AM +1000, NeilBrown wrote:

> There is no reason to be so gloomy.

RTFS.

> The VFS would provide a generic_do_last() (or whatever) which handles
> everything correctly for local filesystems which keep the dcache precisely
> consistent and use it for all the valuable locking it can provide.
> generic_to_last() would call into other filesystem entry points just like the
> current do_last() does. It wouldn't bother with 'revalidation' though.
>
> Then there might be a "generic_network_do_last()" which does minimal if any
> checking because the server will do all that, and just calls back to the
> filesystem depending on which particular operation is happening - mkdir, or
> unlink or whatever.

RTFPOSIX. Semantics of the last step on open is very different from the
rest due to symlink handling.

And do_last() has nothing whatsoever with mkdir() or unlink(). _Those_
are much simpler and don't go anywhere near that rats' nest of horrors.

Seriously, read the damn thing. It *is* horrible, all right, but shifting
it inside NFS won't help you at all. Especially since you have NFSv3 to
cope with, which will take care of bringing in every sodding bit NFSv4 might
evade (for non-directories, that is - for directories you get the full shitpile
in your face, NFSv4 or not).

And no, server will _not_ "do all that". Again, mkdir and unlink are (almost)
trivial (unlink less so, due to NFS-specific shite). The real horrors are
on open().

I would be a lot less gloomy about discussing just passing the buck to
filesystems, starting with NFS, if NFS folks (you, in particular) had
bothered to figure out what the existing code _does_ and why is it doing
what it's doing. So far you have not - not even on the level of "which
functions are hit in which syscall", let alone what those functions are
doing.

And yes, the documentation of the whole thing is piss-poor. What we have
there right now is a weird mix of bits and pieces referring to very different
periods of evolution. Flat-out contradicting each other *and* the code.
I know. Believe me, I know. Fuck, right now the call graph for that thing
_finally_ fits into A4. Four years ago it took a goddamned A_1_. As in,
eight A4 sheets taped together. Yes, really.

At least now we finally have a reasonable chance of getting that sucker into
understandable shape and maybe even getting more folks to understand what
that code is doing. Which, unfortunately, _is_ a requirement for serious
reworks of that code. Frankly, the burden of keeping dcache consistent is
the least of the PITA in there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/