Re: nfsd changes for 3.5

From: Linus Torvalds
Date: Thu May 31 2012 - 18:15:16 EST


On Thu, May 31, 2012 at 1:53 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> On Thu, May 31, 2012 at 01:17:26PM -0700, Linus Torvalds wrote:
>
> Uh, that means ditching some work in my public git tree.  Which I
> haven't rebased in years.  So, a stupid process question; would you
> rather I:
>
>        - continue to be strict about rebasing and apply a bunch of
>          reverts?
>        - ditch it and start over?

I think in this case rebasing is the right thing to do.

I hate rebasing, but what I hate about it is how people who use it as
a development model cause problems for anybody else. I don't think it
will cause problems in this particular case, but if somebody hollers,
let me know.

>> Making
>> it an rwsem might help readdir a tiny amount, but I suspect people
>> actually depend on the mutex in readdir right now.
>
> Al called this all "highly non-trivial":
>
>        http://marc.info/?l=linux-fsdevel&m=132726495726326&w=2
>
> I don't know who'd have the cycles.

I agree, it's a rats nest.

Doing lookups in particular is ridiculously single-threaded for almost
no good reason, though (and create is just a special case of that). It
*should* be possible that push the i_mutex down into the filesystem,
if we just created some fake dentry (with the appropriate support for
lookup to stall on it) to make sure that lookups of the same *name*
are serialized.

At that point, each filesystem could decide that they don't need the
i_mutex for the whole thing.

Maybe.

And readdir() could be done mostly mechanically by changing i_mutex
into an rwsem, making all lockers use a write lock, and pushing the
locking down from the caller into the filesystem for ->readdir().
Again, at that point, I suspect many filesystems could do with much
less locking.

But yeah, it's all nasty. Even the purely mechanical part of changing
i_mutex to an rwsem would not only be a *huge* and painful patch, it
would hit things like lockdep issues too (we don't support the nesting
thing for rwsem annotations, afaik).

So nobody has really done it, and it's so painful that maybe nobody
will. There are loads that hit this serialization point, but they are
*fairly* rare and specialized.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/