Re: [PATCH v5 00/14] ceph: support idmapped mounts

From: Xiubo Li
Date: Tue Jun 13 2023 - 21:53:54 EST



On 6/13/23 22:53, Gregory Farnum wrote:
On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:

On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
Dear friends,

This patchset was originally developed by Christian Brauner but I'll continue
to push it forward. Christian allowed me to do that :)

This feature is already actively used/tested with LXD/LXC project.

Git tree (based on https://github.com/ceph/ceph-client.git master):
Hi Xiubo!

Could you rebase these patches to 'testing' branch ?
Will do in -v6.

And you still have missed several places, for example the following cases:


1 269 fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
mode);
+

2 389 fs/ceph/dir.c <<ceph_readdir>>
req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
+

3 789 fs/ceph/dir.c <<ceph_lookup>>
req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
We don't have an idmapping passed to lookup from the VFS layer. As I
mentioned before, it's just impossible now.
->lookup() doesn't deal with idmappings and really can't otherwise you
risk ending up with inode aliasing which is really not something you
want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
value. So better not even risk exposing the idmapping in there at all.
Thanks for adding, Christian!

I agree, every time when we use an idmapping we need to be careful with
what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
idmapping (not mount),
but in this case, Xiubo want's current_fs{u,g}id to be mapped
according to an idmapping.
Anyway, it's impossible at now and IMHO, until we don't have any
practical use case where
UID/GID-based path restriction is used in combination with idmapped
mounts it's not worth to
make such big changes in the VFS layer.

May be I'm not right, but it seems like UID/GID-based path restriction
is not a widespread
feature and I can hardly imagine it to be used with the container
workloads (for instance),
because it will require to always keep in sync MDS permissions
configuration with the
possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
It is useful when cephfs is used as an external storage on the host, but if you
share cephfs with a few containers with different user namespaces idmapping...
Hmm, while this will break the MDS permission check in cephfs then in
lookup case. If we really couldn't support it we should make it to
escape the check anyway or some OPs may fail and won't work as expected.
I don't pretend to know the details of the VFS (or even our linux
client implementation), but I'm confused that this is apparently so
hard. It looks to me like we currently always fill in the "caller_uid"
with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
valid to begin with? If it is, why can't the uid mapping be applied on
that?

As both the client and the server share authority over the inode's
state (including things like mode bits and owners), and need to do
permission checking, being able to tell the server the relevant actor
is inherently necessary. We also let admins restrict keys to
particular UID/GID combinations as they wish, and it's not the most
popular feature but it does get deployed. I would really expect a user
of UID mapping to be one of the *most* likely to employ such a
facility...maybe not with containers, but certainly end-user homedirs
and shared spaces.

Disabling the MDS auth checks is really not an option. I guess we
could require any user employing idmapping to not be uid-restricted,
and set the anonymous UID (does that work, Xiubo, or was it the broken
one? In which case we'd have to default to root?). But that seems a
bit janky to me.

Yeah, this also seems risky.

Instead disabling the MDS auth checks there is another option, which is we can prevent  the kclient to be mounted or the idmapping to be applied. But this still have issues, such as what if admins set the MDS auth caps after idmap applied to the kclients ?

IMO there have 2 options: the best way is to fix this in VFS if possible. Else to add one option to disable the corresponding MDS auth caps in ceph if users want to support the idmap feature.

Thanks

- Xiubo

-Greg

@Greg

For the lookup requests the idmapping couldn't get the mapped UID/GID
just like all the other requests, which is needed by the MDS permission
check. Is that okay to make it disable the check for this case ? I am
afraid this will break the MDS permssions logic.

Any idea ?

Thanks

- Xiubo


Kind regards,
Alex