[PATCH 0/3] enable memcg accounting for kernfs objects

From: Vasily Averin
Date: Sun Jul 31 2022 - 11:37:26 EST


This patch set enables memcg accounting for kernfs-related objects.

Originally it was a part of patch set
"memcg: accounting for objects allocated by mkdir cgroup"
https://lore.kernel.org/all/0fe836b4-5c0f-0e32-d511-db816d359748@xxxxxxxxxx/

The patches have received approval from several developers,
however respected Michal Hocko pointed out that, if neccesary,
cgroups consumption can be restricted via cgroup.max.descendants
limit without additional accounting of allocated memory.
I still disagree with him, I think that memory limits works better,
but I could not give any new substantial arguments, so discussion
was stalled and patches was frozen in limbo until better times.

However 3 of these patches affect not only cgroups, and I hope
to get help from kernfs maintainers.

Kernfs nodes are quite small kernel objects, however there are few
scenarios where it consumes significant piece of all allocated memory.
I am aware of the following cases, but I am sure there are many other
ones.

1) creating a new netdevice allocates ~50Kb of memory, where ~10Kb
was allocated for 80+ kernfs nodes.

2) cgroupv2 mkdir allocates ~60Kb of memory, ~10Kb of them are kernfs
structures.

3) Shakeel Butt reports that Google has workloads which create 100s
of subcontainers and they have observed high system overhead
without memcg accounting of kernfs.

My experimets with LXC conrainer on Fedora node show that
usually new kernfs node creates few other objects:

Allocs Alloc Allocation
number size
--------------------------------------------
1 + 128 (__kernfs_new_node+0x4d) kernfs node
1 + 88 (__kernfs_iattrs+0x57) kernfs iattrs
1 + 96 (simple_xattr_alloc+0x28) simple_xattr(*)
1 32 (simple_xattr_set+0x59)
1 8 (__kernfs_new_node+0x30)

'+' -- to be accounted

(*) simple_xattr in this scenaro was allocated directly during
kernfs creation for selinux label. Even here it consumes noticeable
part of newly allocated object.
However please keep in mind that xattr can be allocated later,
via setxattr system calls, its size is controlled by userspace
and can reach 64Kb per call. kernfs objects lives in memory,
so it is improtant to account it.

Originally the patches was splitted to simplify their rewiev,
however if required I can merge them together.

Vasily Averin (3):
memcg: enable accounting for kernfs nodes
memcg: enable accounting for kernfs iattrs
memcg: enable accounting for struct simple_xattr

fs/kernfs/mount.c | 6 ++++--
fs/xattr.c | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)

--
2.25.1