Re: [PATCH v4 5/5] cgroup/cpuset: Optimize out unneeded cpuset_can_fork/cpuset_cancel_fork calls

From: Waiman Long
Date: Wed Apr 12 2023 - 15:24:52 EST


On 4/12/23 15:17, Tejun Heo wrote:
Hello,

On Wed, Apr 12, 2023 at 02:40:53PM -0400, Waiman Long wrote:
On 4/12/23 14:27, Tejun Heo wrote:
On Tue, Apr 11, 2023 at 09:36:01AM -0400, Waiman Long wrote:
The newly introduced cpuset_can_fork() and cpuset_cancel_fork() calls
are only needed when the CLONE_INTO_CGROUP flag is set which is not
likely. Adding an extra cpuset_can_fork() call does introduce a bit
of performance overhead in the fork/clone fastpath. To reduce this
performance overhead, introduce a new clone_into_cgroup_can_fork flag
into the cgroup_subsys structure. This flag, when set, will call the
can_fork and cancel_fork methods only if the CLONE_INTO_CGROUP flag
is set.

The cpuset code is now modified to set this flag. The same cpuset
checking code in cpuset_can_fork() and cpuset_cancel_fork() will have
to stay as the cgroups can be different, but the cpusets may still be
the same. So the same check must be present in both cpuset_fork() and
cpuset_can_fork() to make sure that attach_in_progress is correctly set.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Waiman, I'm not necessarily against this optimization but can we at least
have some performance numbers to show that this is actually meaningful?
Given how heavy our fork path is, I'm not too sure this would show up in any
meaningful way.
That make sense to me. I am OK to leave it for now as it is an optimization
patch anyway.

BTW, another question that I have is about the cgroup_threadgroup_rwsem. It
is currently a percpu rwsem. Is it possible to change it into a regular
rwsem instead? It is causing quite a bit of latency for workloads that
require rather frequent changes to cgroups. I know we have a "favordynmods"
mount option to disable the percpu operation. This will still be less
performant than a normal rwsem. Of course the downside is that the fork/exit
fastpaths will be slowed down a bit.
I don't know. Maybe? A rwsem actually has a scalability factor in that the
more CPUs are forking, the more expensive the rwsem becomes, so it is a bit
more of a concern. Another factor is that in majority of use cases we're
almost completely bypassing write-locking percpu_rwsem, so it feel a bit sad
to convert it to a regular rwsem. So, if favordynmods is good enough, I'd
like to keep it that way.

It is just a thought that I have since Juri is in the process of reverting the change of cpuset_mutex to cpuset_rwsem. Percpu rwsem can be a bit problematic in PREEMPT_RT kernel since it does not support proper priority inheritance though.

Cheers,
Longman