Re: [PATCH] Bind Mount Extensions 0.04.1 3/5

From: viro
Date: Fri Mar 19 2004 - 09:53:53 EST


On Fri, Mar 19, 2004 at 02:40:28PM +0100, Herbert Poetzl wrote:
> > FWIW, I would start with
> > 1) split out simple_permission() - vfs_permission() sans the
> > r/o checks; vfs_permission() would call it and all in-tree calls of
> > vfs_permission() would get expanded.
>
> -· vfs_permission()
> ¦-· hfs_permission()
> ¦-· hfsplus_permission()
> ¦-· nfs_permission()
> ¦-* permission()
> ¦-· presto_permission()
> '-· proc_permission()
>
> please elaborate ...

First chunk: replace calls of vfs_permission() with r/o checks +
simple_permission(), _leaving_ vfs_permission() _itself_ _as-is_.

> > 2) prove that all instances of ->permission() honour r/o checks.
> > Fix the broken ones (and yes, we do have them - e.g. hfs_permission()
> > or bogus return values in proc_permission()), after we'd shown that
> > it's safe. Note that it's not obvious - e.g. anything around ACLs or
> > <barf> XFS ioctls is not just fscking ugly - it's brittle as hell and
> > will require very careful treatment.
>
> okay, but why not do it in permission(), and make
> the few callers of vfs_permission use that one instead?

Filesystem code might be calling permission(9) and expecting (broken)
behaviour from its own ->permission(). Switch to uniform behaviour
would break such places, forcing us to compensate there. And analysis
won't be fun.

> except for jfs_permission() and nfsd_permission()
> (which is hell enough) there are no direct calls
> of *_permission() ...

fs code calling permission() is basically equivalent to direct call.

> > 3) once that is done, put r/o checks into the beginning of
> > permission(9)
>
> hmm, obviously you didn't read the patch at all, as that
> is exactly what I did, so please have a look at it ...

Yes, I have read it. Try to read the reply - what I'm saying is that it
should be a separate chunk in the series _and_ it will need some preparations
before it can go in.

> > 4) for all instances of ->permission(), move r/o checks in
> > the places that call that instance directly. Remove them from method
> > itself.
> >
> > And yes, #2 will hurt. Badly.
>
> I don't see why, please enlighten me ...

See above.

> > BTW, IS_RDONLY() part of that stuff will really hit the fan when you start
> > touching the FPOS in fs/ext2/xattr.c and around it. Have fun...
> >
> > Note that it's not enough to bring relevant vfsmount to every caller of
> > IS_RDONLY() - if we are calling it to make sure that fs is not r/o,
> > we _really_ want to make sure that it doesn't get remounted r/o just as
> > IS_RDONLY() returns. And yes, there are real bugs in that area.
>
> hmm, well but how would moving the ro checks around
> or adding them for the vfs mounts influence that?
>
> I agree that this _is_ an issue but why should this _not_
> happen with mount -o remount,ro /xy ?

Once the checks are moved up, we split them into pairs (want_write_access(),
drop_write_access()) where the former can fail. Then the second call gets
moved past the call of fs method. IOW, we go from

start fs activity
deep in the bowels of fs code check for r/o and possibly fail
do the rest

to
check for r/o and possibly fail
do fs work

_and_ to

request write access and possibly fail
do fs work
drop write access

Transition between (2) and (3) is simple and relatively easy to verify
_IF_ the checks are lifted up. And result will give sane protection
for remount - switch to r/o would have to be atomic wrt requesting
write access and would fail if we had writers.

Note that per-mountpoint r/o will take pretty much the same amount of work -
propagating vfsmounts down to the IS_RDONLY checks only to have that reverted
when we lift the checks up would mean doing more or less the same twice.

It can be done and it can be merged gradually in small and simple chunks,
so that they could be (a) well-understood and (b) well-tested.

The way we split the series *does* matter - composition of these patches
will be large and will touch quite a few places (there's extra fun with
delayed writes - e.g. removal of last link in foo_unlink() will mean
extra request for write access, dropped when inode is finally deleted,
possibly hours after unlink(2) had returned). Doing that in one or even
3-4 patches would be insane even in 2.7; in 2.6 it's so out of question
that it's not even funny.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/