Re: [PATCH 1/1] cgroup/cpuset: Rebuild sched domains if isolated partition changed

From: Waiman Long
Date: Mon Sep 18 2023 - 12:51:07 EST


On 9/18/23 03:07, Pierre Gondois wrote:
Hello Longman,

On 9/17/23 18:51, Waiman Long wrote:
On 9/15/23 11:45, Pierre Gondois wrote:
When an isolated parition is created, the sched domains (sds) are
rebuilt and the sds of the isolated CPUs are detached. This only
happens at the creation of the isolated parition. Updating
the cpuset of the partition doesn't rebuild the sds:

To reproduce:
    # ls /sys/kernel/debug/sched/domains/cpu0/
    domain0
    # ls /sys/kernel/debug/sched/domains/cpu1/
    domain0
    #
    # mkdir cgroup
    # mount -t cgroup2 none cgroup/
    # mkdir cgroup/A1/
    # echo "+cpuset" > cgroup/cgroup.subtree_control
    # echo 0-3 > cgroup/A1/cpuset.cpus
    # echo isolated > cgroup/A1/cpuset.cpus.partition
    #
    # ls /sys/kernel/debug/sched/domains/cpu0/
    # ls /sys/kernel/debug/sched/domains/cpu1/
    #
    # echo 0 > cgroup/A1/cpuset.cpus
    # ls /sys/kernel/debug/sched/domains/cpu0/
    # ls /sys/kernel/debug/sched/domains/cpu1/
    #

Here CPU1 should have a sched domain re-attached.

Signed-off-by: Pierre Gondois <pierre.gondois@xxxxxxx>
---
   kernel/cgroup/cpuset.c | 12 ++++++++----
   1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58e6f18f01c1..e3eb27ff9b68 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1680,11 +1680,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
            * empty cpuset is changed, we need to rebuild sched domains.
            * On default hierarchy, the cpuset needs to be a partition
            * root as well.
+         * Also rebuild sched domains if the cpuset of an isolated
+         * partition changed.
            */
-        if (!cpumask_empty(cp->cpus_allowed) &&
-            is_sched_load_balance(cp) &&
-           (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
-            is_partition_valid(cp)))
+        if ((!cpumask_empty(cp->cpus_allowed) &&
+             is_sched_load_balance(cp) &&
+             (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
+              is_partition_valid(cp))) ||
+            (cp->partition_root_state == PRS_ISOLATED ||
+             cp->partition_root_state == PRS_INVALID_ISOLATED))
               need_rebuild_sched_domains = true;
              rcu_read_lock();

Thanks for spotting the problem and sending out a patch to fix it.
However, it should be done in the update_cpumask(). The sched_domain
rebuild in update_cpumasks_hier() is supposed to be used for impacted
descendant cpusets lower down in the hierarchy.

Anyway, I believe this problem should have been fixed in commit
a86ce68078b2 ("cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE &
CS_SCHED_LOAD_BALANCE handling") recently merged into the v6.6 kernel.
Would you mind testing the latest upstream kernel to see if this problem
is gone or not?

Yes right, v6.6-rc2 kernel doesn't have this issue. My bad for not updating
it ...

However I think the second issue described in the cover letter can still be
reproduced on v6.6-rc2. This present patch was not aiming to fix it though.

Commands:
# mkdir cgroup
# mount -t cgroup2 none cgroup/
# mkdir cgroup/A1 cgroup/B1
# echo "+cpuset" > cgroup/cgroup.subtree_control
# echo 0-3 > cgroup/A1/cpuset.cpus
# echo isolated > cgroup/A1/cpuset.cpus.partition
# echo 4-6 > cgroup/B1/cpuset.cpus
# cat cgroup/A1/cpuset.cpus.partition
isolated
# echo 0-4 > cgroup/A1/cpuset.cpus
# cat cgroup/A1/cpuset.cpus.partition
isolated invalid (Cpu list in cpuset.cpus not exclusive)
# echo 0-3 > cgroup/A1/cpuset.cpus
# cat cgroup/A1/cpuset.cpus.partition
isolated invalid (Cpu list in cpuset.cpus not exclusive)

Cgroup A1 should be a valid isolated partition once its CPUs become
exclusive again I believe,

Yes, the current code doesn't always detect that an invalid partition can be valid again and turn it back on. Will try to fix that. In the mean time, a workaround is to do

# echo member > cgroup/A1/cpuset.cpus.partition
# echo isolated > cgroup/A1/cpuset.cpus.partition

Thanks,
Longman