[RFC v3 0/5] overlayfs: Optimize override/revert creds

From: Vinicius Costa Gomes
Date: Fri Feb 16 2024 - 00:17:15 EST



Hi,

Changes from RFC v2:
- Added separate patches for the warnings for the discarded const
when using the cleanup macros: one for DEFINE_GUARD() and one for
DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them
together);
- Reordered the series so the backing file patch is the first user of
the introduced helpers (Amir Goldstein);
- Change the definition of the cleanup "class" from a GUARD to a
LOCK_GUARD_1, which defines an implicit container, that allows us
to remove some variable declarations to store the overriden
credentials (Amir Goldstein);
- Replaced most of the uses of scoped_guard() with guard(), to reduce
the code churn, the remaining ones I wasn't sure if I was changing
the behavior: either they were nested (overrides "inside"
overrides) or something calls current_cred() (Amir Goldstein).

New questions:
- The backing file callbacks are now called with the "light"
overriden credentials, so they are kind of restricted in what they
can do with their credentials, is this acceptable in general?
- in ovl_rename() I had to manually call the "light" the overrides,
both using the guard() macro or using the non-light version causes
the workload to crash the kernel. I still have to investigate why
this is happening. Hints are appreciated.

Link to the RFC v2:

https://lore.kernel.org/r/20240125235723.39507-1-vinicius.gomes@xxxxxxxxx/

Original cover letter (lightly edited):

It was noticed that some workloads suffer from contention on
increasing/decrementing the ->usage counter in their credentials,
those refcount operations are associated with overriding/reverting the
current task credentials. (the linked thread adds more context)

In some specialized cases, overlayfs is one of them, the credentials
in question have a longer lifetime than the override/revert "critical
section". In the overlayfs case, the credentials are created when the
fs is mounted and destroyed when it's unmounted. In this case of long
lived credentials, the usage counter doesn't need to be
incremented/decremented.

Add a lighter version of credentials override/revert to be used in
these specialized cases. To make sure that the override/revert calls
are paired, add a cleanup guard macro. This was suggested here:

https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/

With a small number of tweaks:
- Used inline functions instead of macros;
- A small change to store the credentials into the passed argument,
the guard is now defined as (note the added '_T ='):

DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T),
revert_creds_light(_T));

- Allow "const" arguments to be used with these kind of guards;

Some comments:
- If patch 1/5 and 2/5 are not a good idea (adding the cast), the
alternative I can see is using some kind of container for the
credentials;
- The only user for the backing file ops is overlayfs, so these
changes make sense, but may not make sense in the most general
case;

For the numbers, some from 'perf c2c', before this series:
(edited to fit)

#
# ----- HITM ----- Shared
# Num RmtHitm LclHitm Symbol Object Source:Line Node
# ..... ....... ....... .......................... ................ .................. ....
#
-------------------------
0 412 1028
-------------------------
41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1
15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1
0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1
0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1
32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1
0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0

-------------------------
1 50 103
-------------------------
100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1

-------------------------
2 50 98
-------------------------
96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1
0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0
2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1

after this series:

#
# ----- HITM ----- Shared
# Num RmtHitm LclHitm Symbol Object Source:Line Node
# ..... ....... ....... .................... ................ ................ ....
#
-------------------------
0 54 88
-------------------------
100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1

-------------------------
1 48 83
-------------------------
97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1
0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1

-------------------------
2 28 44
-------------------------
85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1

The contention is practically gone.

Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@xxxxxxxxx/

Vinicius Costa Gomes (5):
cleanup: Fix discarded const warning when defining lock guard
cleanup: Fix discarded const warning when defining guard
cred: Add a light version of override/revert_creds()
fs: Optimize credentials reference count for backing file ops
overlayfs: Optimize credentials usage

fs/backing-file.c | 27 +++++--------------
fs/overlayfs/copy_up.c | 4 +--
fs/overlayfs/dir.c | 23 +++++++---------
fs/overlayfs/file.c | 28 +++++--------------
fs/overlayfs/inode.c | 59 +++++++++++++++--------------------------
fs/overlayfs/namei.c | 20 ++++----------
fs/overlayfs/readdir.c | 16 +++--------
fs/overlayfs/util.c | 10 +++----
fs/overlayfs/xattrs.c | 33 +++++++++--------------
include/linux/cleanup.h | 4 +--
include/linux/cred.h | 21 +++++++++++++++
kernel/cred.c | 6 ++---
12 files changed, 97 insertions(+), 154 deletions(-)

--
2.43.0