Re: [PATCH v5 00/14] ceph: support idmapped mounts

From: Christian Brauner
Date: Wed Jun 14 2023 - 05:46:44 EST


On Tue, Jun 13, 2023 at 02:46:02PM +0200, Aleksandr Mikhalitsyn wrote:
> On Tue, Jun 13, 2023 at 3:43 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> >
> >
> > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> > >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > >>>>
> > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > >>>>> Dear friends,
> > >>>>>
> > >>>>> This patchset was originally developed by Christian Brauner but I'll continue
> > >>>>> to push it forward. Christian allowed me to do that :)
> > >>>>>
> > >>>>> This feature is already actively used/tested with LXD/LXC project.
> > >>>>>
> > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> > >>> Hi Xiubo!
> > >>>
> > >>>> Could you rebase these patches to 'testing' branch ?
> > >>> Will do in -v6.
> > >>>
> > >>>> And you still have missed several places, for example the following cases:
> > >>>>
> > >>>>
> > >>>> 1 269 fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> > >>>> req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> > >>>> mode);
> > >>> +
> > >>>
> > >>>> 2 389 fs/ceph/dir.c <<ceph_readdir>>
> > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> > >>> +
> > >>>
> > >>>> 3 789 fs/ceph/dir.c <<ceph_lookup>>
> > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> > >>> We don't have an idmapping passed to lookup from the VFS layer. As I
> > >>> mentioned before, it's just impossible now.
> > >> ->lookup() doesn't deal with idmappings and really can't otherwise you
> > >> risk ending up with inode aliasing which is really not something you
> > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> > >> value. So better not even risk exposing the idmapping in there at all.
> > > Thanks for adding, Christian!
> > >
> > > I agree, every time when we use an idmapping we need to be careful with
> > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> > > idmapping (not mount),
> > > but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > > according to an idmapping.
> > > Anyway, it's impossible at now and IMHO, until we don't have any
> > > practical use case where
> > > UID/GID-based path restriction is used in combination with idmapped
> > > mounts it's not worth to
> > > make such big changes in the VFS layer.
> > >
> > > May be I'm not right, but it seems like UID/GID-based path restriction
> > > is not a widespread
> > > feature and I can hardly imagine it to be used with the container
> > > workloads (for instance),
> > > because it will require to always keep in sync MDS permissions
> > > configuration with the
> > > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> > > It is useful when cephfs is used as an external storage on the host, but if you
> > > share cephfs with a few containers with different user namespaces idmapping...
> >
> > Hmm, while this will break the MDS permission check in cephfs then in
> > lookup case. If we really couldn't support it we should make it to
> > escape the check anyway or some OPs may fail and won't work as expected.
>
> Hi Xiubo!
>
> Disabling UID/GID checks on the MDS side looks reasonable. IMHO the
> most important checks are:
> - open
> - mknod/mkdir/symlink/rename
> and for these checks we already have an idmapping.
>
> Also, I want to add that it's a little bit unusual when permission
> checks are done against the caller UID/GID.

The server side permission checking based on the sender's fs{g,u}id is
rather esoteric imho. So I would just disable it for idmapped mounts.

> Usually, if we have opened a file descriptor and, for instance, passed
> this file descriptor through a unix socket then
> file descriptor holder will be able to use it in accordance with the
> flags (O_RDONLY, O_RDWR, ...).
> We also have ->f_cred on the struct file that contains credentials of
> the file opener and permission checks are usually done
> based on this. But in cephfs we are always using syscall caller's
> credentials. It makes cephfs file descriptor "not transferable"
> in terms of permission checks.

Yeah, that's another good point.