Re: [RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition

From: Waiman Long
Date: Tue May 02 2023 - 17:27:14 EST


On 5/2/23 14:01, Michal Koutný wrote:
Hello.

The previous thread arrived incomplete to me, so I respond to the last
message only. Point me to a message URL if it was covered.

On Fri, Apr 14, 2023 at 03:06:27PM -0400, Waiman Long <longman@xxxxxxxxxx> wrote:
Below is a draft of the new cpuset.cpus.reserve cgroupfs file:

  cpuset.cpus.reserve
        A read-write multiple values file which exists on all
        cpuset-enabled cgroups.

        It lists the reserved CPUs to be used for the creation of
        child partitions.  See the section on "cpuset.cpus.partition"
        below for more information on cpuset partition.  These reserved
        CPUs should be a subset of "cpuset.cpus" and will be mutually
        exclusive of "cpuset.cpus.effective" when used since these
        reserved CPUs cannot be used by tasks in the current cgroup.

        There are two modes for partition CPUs reservation -
        auto or manual.  The system starts up in auto mode where
        "cpuset.cpus.reserve" will be set automatically when valid
        child partitions are created and users don't need to touch the
        file at all.  This mode has the limitation that the parent of a
        partition must be a partition root itself.  So child partition
        has to be created one-by-one from the cgroup root down.

        To enable the creation of a partition down in the hierarchy
        without the intermediate cgroups to be partition roots,
Why would be this needed? Owning a CPU (a resource) must logically be
passed all the way from root to the target cgroup, i.e. this is
expressed by valid partitioning down to given level.

one
        has to turn on the manual reservation mode by writing directly
        to "cpuset.cpus.reserve" with a value different from its
        current value.  By distributing the reserve CPUs down the cgroup
        hierarchy to the parent of the target cgroup, this target cgroup
        can be switched to become a partition root if its "cpuset.cpus"
        is a subset of the set of valid reserve CPUs in its parent.
level n
`- level n+1
cpuset.cpus // these are actually configured by "owner" of level n
cpuset.cpus.partition // similrly here, level n decides if child is a partition

I.e. what would be level n/cpuset.cpus.reserve good for when it can
directly control level n+1/cpuset.cpus?

In the new scheme, the available cpus are still directly passed down to a descendant cgroup. However, isolated CPUs (or more generally CPUs dedicated to a partition) have to be exclusive. So what the cpuset.cpus.reserve does is to identify those exclusive CPUs that can be excluded from the effective_cpus of the parent cgroups before they are claimed by a child partition. Currently this is done automatically when a child partition is created off a parent partition root. The new scheme will break it into 2 separate steps without the requirement that the parent of a partition has to be a partition root itself.

Cheers,
Longman

claimed by a partition and will be excluded from the effective_cpus of the parent