Re: [RFC] HACK: overlayfs: Optimize overlay/restore creds

From: Amir Goldstein
Date: Fri Dec 15 2023 - 05:31:15 EST


+fsdevel because this may be relevant to any subsystem that
keeps a long live cred copy (e.g. nfsd, ksmbd, cachefiles).

+linus who wrote
d7852fbd0f04 access: avoid the RCU grace period for the temporary
subjective credentials


On Fri, Dec 15, 2023 at 12:02 AM Vinicius Costa Gomes
<vinicius.gomes@xxxxxxxxx> wrote:
>
> Permission checks in overlayfs also check against the credentials
> associated with the superblock, which are assigned at mount() time, so
> pretty long lived. So, we can omit the reference counting for this
> case.

You forgot to mention WHY you are proposing this and to link to the
original report with the first optimization attempt:

https://lore.kernel.org/linux-unionfs/20231018074553.41333-1-hu1.chen@xxxxxxxxx/

>
> This (very early) proof of concept does two things:
>
> Add a flag "immutable" (TODO: find a better name) to struct cred to
> indicate that it is long lived, and that refcount can be omitted.
>

This reminds me of the many discussions about Rust abstractions
that are going on right now.
I think an abstraction like this one is called a "borrowed reference".

> Add "guard" helpers, so we can use automatic cleanup to be sure
> override/restore are always paired. (I didn't like that I have
> 'ovl_cred' to be a container for the credentials, but couldn't think
> of other solutions)
>

I like the guard but see comments below...

> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
> ---
> Hi Amir,
>
> Just to know if I am more or less on right track.
>
> This is a different attempt, instead of the local copy idea, I am
> using the fact that the credentials associated with the mount() will
> be alive for a long time. I think the result is almost the same. But I
> could be missing something.
>
> TODO:
> - Add asserts.
> - Replace ovl_override_creds()/revert_Creds() by
> ovl_creator_cred()/guard() everywhere.
> - Probably more.
>
>
> fs/overlayfs/inode.c | 7 ++++---
> fs/overlayfs/overlayfs.h | 18 ++++++++++++++++++
> fs/overlayfs/params.c | 4 +++-
> fs/overlayfs/super.c | 10 +++++++---
> fs/overlayfs/util.c | 10 ++++++++++
> include/linux/cred.h | 12 ++++++++++--
> 6 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index c63b31a460be..2c016a3bbe2d 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -290,9 +290,9 @@ int ovl_permission(struct mnt_idmap *idmap,
> struct inode *inode, int mask)
> {
> struct inode *upperinode = ovl_inode_upper(inode);
> + struct ovl_cred ovl_cred;
> struct inode *realinode;
> struct path realpath;
> - const struct cred *old_cred;

Nit: please don't reorder the variable definitions.

> int err;
>
> /* Careful in RCU walk mode */
> @@ -310,7 +310,9 @@ int ovl_permission(struct mnt_idmap *idmap,
> if (err)
> return err;
>
> - old_cred = ovl_override_creds(inode->i_sb);
> + ovl_cred = ovl_creator_cred(inode->i_sb);
> + guard(ovl_creds)(&ovl_cred);
> +
> if (!upperinode &&
> !special_file(realinode->i_mode) && mask & MAY_WRITE) {
> mask &= ~(MAY_WRITE | MAY_APPEND);
> @@ -318,7 +320,6 @@ int ovl_permission(struct mnt_idmap *idmap,
> mask |= MAY_READ;
> }
> err = inode_permission(mnt_idmap(realpath.mnt), realinode, mask);
> - revert_creds(old_cred);
>
> return err;
> }
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 05c3dd597fa8..22ea3066376e 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -416,6 +416,24 @@ static inline int ovl_do_getattr(const struct path *path, struct kstat *stat,
> return vfs_getattr(path, stat, request_mask, flags);
> }
>
> +struct ovl_cred {
> + const struct cred *cred;
> +};
> +
> +static inline struct ovl_cred ovl_creator_cred(struct super_block *sb)
> +{
> + struct ovl_fs *ofs = OVL_FS(sb);
> +
> + return (struct ovl_cred) { .cred = ofs->creator_cred };
> +}
> +
> +void ovl_override_creds_new(struct ovl_cred *creator_cred);
> +void ovl_revert_creds_new(struct ovl_cred *creator_cred);
> +
> +DEFINE_GUARD(ovl_creds, struct ovl_cred *,
> + ovl_override_creds_new(_T),
> + ovl_revert_creds_new(_T));
> +

This pattern is not unique to overlayfs.
It is probably better to define a common container type struct override_cred
in cred.h/cred.c that other code could also use.

> /* util.c */
> int ovl_get_write_access(struct dentry *dentry);
> void ovl_put_write_access(struct dentry *dentry);
> diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
> index 3fe2dde1598f..008377b9241a 100644
> --- a/fs/overlayfs/params.c
> +++ b/fs/overlayfs/params.c
> @@ -770,8 +770,10 @@ void ovl_free_fs(struct ovl_fs *ofs)
> kfree(ofs->config.lowerdirs);
> kfree(ofs->config.upperdir);
> kfree(ofs->config.workdir);
> - if (ofs->creator_cred)
> + if (ofs->creator_cred) {
> + cred_set_immutable(ofs->creator_cred, false);
> put_cred(ofs->creator_cred);

Not happy about this API.

Two solutions I can think of:
1. (my preference) keep two copies of creator_cred, one refcounted copy
and one non-refcounted that is used for override_creds()
2. put_cred_ref() which explicitly opts-in to dropping refcount on
a borrowed reference, same as you do above but hidden behind
a properly documented helper

> + }
> kfree(ofs);
> }
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index a0967bb25003..1ffb4f0f8186 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -1304,6 +1304,13 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc)
> if (!cred)
> goto out_err;
>
> + /* Never override disk quota limits or use reserved space */
> + cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> + /* The cred that is going to be associated with the super
> + * block will not change.
> + */
> + cred_set_immutable(cred, true);
> +

Likewise, either:
1. Create a non-refcounted copy of creator_cred
or
2. Use a documented helper prepare_creds_ref() to hide
this implementation detail

> err = ovl_fs_params_verify(ctx, &ofs->config);
> if (err)
> goto out_err;
> @@ -1438,9 +1445,6 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc)
> else if (!ofs->nofh)
> sb->s_export_op = &ovl_export_fid_operations;
>
> - /* Never override disk quota limits or use reserved space */
> - cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> -
> sb->s_magic = OVERLAYFS_SUPER_MAGIC;
> sb->s_xattr = ovl_xattr_handlers(ofs);
> sb->s_fs_info = ofs;
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index c3f020ca13a8..9ae9a35a6a7a 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -68,6 +68,16 @@ const struct cred *ovl_override_creds(struct super_block *sb)
> return override_creds(ofs->creator_cred);
> }
>
> +void ovl_override_creds_new(struct ovl_cred *creator_cred)
> +{
> + creator_cred->cred = override_creds(creator_cred->cred);
> +}
> +
> +void ovl_revert_creds_new(struct ovl_cred *creator_cred)
> +{
> + revert_creds(creator_cred->cred);
> +}

Would look nicer in this generic form, no?

void override_cred_save(struct override_cred *override)
{
override->cred = override_creds(override->cred);
}

void override_cred_restore(struct override_cred *old)
{
revert_creds(old->cred);
}

Which reminds me that memalloc_*_{save,restore} are good
candidates for defining a guard.

> +
> /*
> * Check if underlying fs supports file handles and try to determine encoding
> * type, in order to deduce maximum inode number used by fs.
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index af8d353a4b86..06eaedfe48ea 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -151,6 +151,7 @@ struct cred {
> int non_rcu; /* Can we skip RCU deletion? */
> struct rcu_head rcu; /* RCU deletion hook */
> };
> + bool immutable;
> } __randomize_layout;
>

If we choose the design that the immutable/non-refcount property
is a const property and we need to create a copy of struct cred
whenever we want to use a non-refcounted copy, then we could
store this in the union because RCU deletion is also not needed for
non-refcounted copy:

struct {
int non_refcount:1; /* A borrowed reference? */
int non_rcu:1; /* Can we skip RCU deletion? */
};
struct rcu_head rcu; /* RCU deletion hook */
};

> extern void __put_cred(struct cred *);
> @@ -229,7 +230,8 @@ static inline bool cap_ambient_invariant_ok(const struct cred *cred)
> */
> static inline struct cred *get_new_cred_many(struct cred *cred, int nr)
> {
> - atomic_add(nr, &cred->usage);
> + if (!cred->immutable)
> + atomic_add(nr, &cred->usage);
> return cred;
> }
>
> @@ -245,6 +247,12 @@ static inline struct cred *get_new_cred(struct cred *cred)
> return get_new_cred_many(cred, 1);
> }
>
> +static inline void cred_set_immutable(const struct cred *cred, bool imm)
> +{
> + struct cred *nonconst_cred = (struct cred *) cred;
> + nonconst_cred->immutable = imm;
> +}
> +
> /**
> * get_cred_many - Get references on a set of credentials
> * @cred: The credentials to reference
> @@ -313,7 +321,7 @@ static inline void put_cred_many(const struct cred *_cred, int nr)
>
> if (cred) {
> validate_creds(cred);
> - if (atomic_sub_and_test(nr, &cred->usage))
> + if (!cred->immutable && atomic_sub_and_test(nr, &cred->usage))
> __put_cred(cred);
> }
> }
> --
> 2.43.0
>

Thanks,
Amir.