Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

From: Miaohe Lin
Date: Tue Jul 11 2023 - 21:57:03 EST


On 2023/7/11 19:52, Michal Koutný wrote:
> On Tue, Jul 11, 2023 at 10:52:02AM +0800, Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote:
>> commit 2bdfd2825c9662463371e6691b1a794e97fa36b4
>> Author: Waiman Long <longman@xxxxxxxxxx>
>> Date: Wed Feb 2 22:31:03 2022 -0500
>>
>> cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
>
> Aha, thanks for the pointer.
>
> I've also found a paragraph in [1]:
>> In addition, the -rt patchset turns spinlocks into a sleeping locks so
>> that the corresponding critical sections can be preempted, which also
>> means that these sleeplockified spinlocks (but not other sleeping
>> locks!) may be acquire within -rt-Linux-kernel RCU read-side critical
>> sections.
>
> That suggests (together with practical use) that dicussed spinlocks
> should be fine in RCU read section. And the possible reason is deeper in
> generate_sched_domains() that do kmalloc(..., GFP_KERNEL).

update_parent_subparts_cpumask() would call update_flag() that do kmemdup(..., GFP_KERNEL)?

>
> Alas update_cpumask_hier() still calls generate_sched_domains(), OTOH,
> update_parent_subparts_cpumask() doesn't seem so.

It seems update_parent_subparts_cpumask() doesn't call generate_sched_domains().

>
> The idea to not relieve rcu_read_lock() in update_cpumask() iteration
> (instead of the technically unneeded refcnt bump) would have to be
> verified with CONFIG_PROVE_RCU && CONFIG_LOCKDEP. WDYT?

The idea to relieve rcu_read_lock() in update_cpumask() iteration was initially introduced
via the below commit:

commit d7c8142d5a5534c3c7de214e35a40a493a32b98e
Author: Waiman Long <longman@xxxxxxxxxx>
Date: Thu Sep 1 16:57:43 2022 -0400

cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule

Currently, changes in "cpust.cpus" of a partition root is not allowed if
it violates the sibling cpu exclusivity rule when the check is done
in the validate_change() function. That is inconsistent with the
other cpuset changes that are always allowed but may make a partition
invalid.

Update the cpuset code to allow cpumask change even if it violates the
sibling cpu exclusivity rule, but invalidate the partition instead
just like the other changes. However, other sibling partitions with
conflicting cpumask will also be invalidated in order to not violating
the exclusivity rule. This behavior is specific to this partition
rule violation.

Note that a previous commit has made sibling cpu exclusivity rule check
the last check of validate_change(). So if -EINVAL is returned, we can
be sure that sibling cpu exclusivity rule violation is the only rule
that is broken.

It would be really helpful if @Waiman can figure this out.

Thanks both.