Re: [PATCH v5 00/14] ceph: support idmapped mounts

From: Aleksandr Mikhalitsyn
Date: Tue Jun 13 2023 - 12:27:40 EST


On Tue, Jun 13, 2023 at 4:54 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> >
> >
> > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> > >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > >>>>
> > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > >>>>> Dear friends,
> > >>>>>
> > >>>>> This patchset was originally developed by Christian Brauner but I'll continue
> > >>>>> to push it forward. Christian allowed me to do that :)
> > >>>>>
> > >>>>> This feature is already actively used/tested with LXD/LXC project.
> > >>>>>
> > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> > >>> Hi Xiubo!
> > >>>
> > >>>> Could you rebase these patches to 'testing' branch ?
> > >>> Will do in -v6.
> > >>>
> > >>>> And you still have missed several places, for example the following cases:
> > >>>>
> > >>>>
> > >>>> 1 269 fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> > >>>> req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> > >>>> mode);
> > >>> +
> > >>>
> > >>>> 2 389 fs/ceph/dir.c <<ceph_readdir>>
> > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> > >>> +
> > >>>
> > >>>> 3 789 fs/ceph/dir.c <<ceph_lookup>>
> > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> > >>> We don't have an idmapping passed to lookup from the VFS layer. As I
> > >>> mentioned before, it's just impossible now.
> > >> ->lookup() doesn't deal with idmappings and really can't otherwise you
> > >> risk ending up with inode aliasing which is really not something you
> > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> > >> value. So better not even risk exposing the idmapping in there at all.
> > > Thanks for adding, Christian!
> > >
> > > I agree, every time when we use an idmapping we need to be careful with
> > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> > > idmapping (not mount),
> > > but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > > according to an idmapping.
> > > Anyway, it's impossible at now and IMHO, until we don't have any
> > > practical use case where
> > > UID/GID-based path restriction is used in combination with idmapped
> > > mounts it's not worth to
> > > make such big changes in the VFS layer.
> > >
> > > May be I'm not right, but it seems like UID/GID-based path restriction
> > > is not a widespread
> > > feature and I can hardly imagine it to be used with the container
> > > workloads (for instance),
> > > because it will require to always keep in sync MDS permissions
> > > configuration with the
> > > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> > > It is useful when cephfs is used as an external storage on the host, but if you
> > > share cephfs with a few containers with different user namespaces idmapping...
> >
> > Hmm, while this will break the MDS permission check in cephfs then in
> > lookup case. If we really couldn't support it we should make it to
> > escape the check anyway or some OPs may fail and won't work as expected.

Dear Gregory,

Thanks for the fast reply!

>
> I don't pretend to know the details of the VFS (or even our linux
> client implementation), but I'm confused that this is apparently so
> hard. It looks to me like we currently always fill in the "caller_uid"
> with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
> valid to begin with? If it is, why can't the uid mapping be applied on
> that?

Applying an idmapping is not hard, it's as simple as replacing
from_kuid(&init_user_ns, req->r_cred->fsuid)
to
from_vfsuid(req->r_mnt_idmap, &init_user_ns, VFSUIDT_INIT(req->r_cred->fsuid))

but the problem is that we don't have req->r_mnt_idmap for all the requests.
For instance, we don't have idmap arguments (that come from the VFS
layer) for ->lookup
operation and many others. There are some reasons for that (Christian
has covered some of them).
So, it's not about my laziness to implement that. It's a real pain ;-)

>
> As both the client and the server share authority over the inode's
> state (including things like mode bits and owners), and need to do
> permission checking, being able to tell the server the relevant actor
> is inherently necessary. We also let admins restrict keys to
> particular UID/GID combinations as they wish, and it's not the most
> popular feature but it does get deployed. I would really expect a user
> of UID mapping to be one of the *most* likely to employ such a
> facility...maybe not with containers, but certainly end-user homedirs
> and shared spaces.
>
> Disabling the MDS auth checks is really not an option. I guess we
> could require any user employing idmapping to not be uid-restricted,
> and set the anonymous UID (does that work, Xiubo, or was it the broken
> one? In which case we'd have to default to root?). But that seems a
> bit janky to me.

That's an interesting point about anonymous UID, but at the same time,
We use these caller's fs UID/GID values as an owner's UID/GID for
newly created inodes.
It means that we can't use anonymous UID everywhere in this case
otherwise all new files/directories
will be owned by an anonymous user.

> -Greg

Kind regards,
Alex

>
> > @Greg
> >
> > For the lookup requests the idmapping couldn't get the mapped UID/GID
> > just like all the other requests, which is needed by the MDS permission
> > check. Is that okay to make it disable the check for this case ? I am
> > afraid this will break the MDS permssions logic.
> >
> > Any idea ?
> >
> > Thanks
> >
> > - Xiubo
> >
> >
> > > Kind regards,
> > > Alex
> > >
> >
>