Re: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus affinity was explicitly set

From: Tejun Heo
Date: Thu Jul 28 2022 - 13:23:15 EST


Hello,

On Wed, Jul 27, 2022 at 08:58:14PM -0400, Waiman Long wrote:
> It was found that any change to the current cpuset hierarchy may reset
> the cpus_allowed list of the tasks in the affected cpusets to the
> default cpuset value even if those tasks have cpus affinity explicitly
> set by the users before. That is especially easy to trigger under a
> cgroup v2 environment where writing "+cpuset" to the root cgroup's
> cgroup.subtree_control file will reset the cpus affinity of all the
> processes in the system.
>
> That is especially problematic in a nohz_full environment where the
> tasks running in the nohz_full CPUs usually have their cpus affinity
> explicitly set and will behave incorrectly if cpus affinity changes.
>
> Fix this problem by adding a flag in the task structure to indicate that
> a task has their cpus affinity explicitly set before and make cpuset
> code not to change their cpus_allowed list unless the user chosen cpu
> list is no longer a subset of the cpus_allowed list of the cpuset itself.
>
> With that change in place, it was verified that tasks that have its
> cpus affinity explicitly set will not be affected by changes made to
> the v2 cgroup.subtree_control files.

I think the underlying cause here is cpuset overwriting the cpumask the user
configured but that's a longer discussion.

> +/*
> + * Don't change the cpus_allowed list if cpus affinity has been explicitly
> + * set before unless the current cpu list is not a subset of the new cpu list.
> + */
> +static int cpuset_set_cpus_allowed_ptr(struct task_struct *p,
> + const struct cpumask *new_mask)
> +{
> + if (p->cpus_affinity_set && cpumask_subset(p->cpus_ptr, new_mask))
> + return 0;
> +
> + p->cpus_affinity_set = 0;
> + return set_cpus_allowed_ptr(p, new_mask);
> +}

I wonder whether the more predictable behavior would be always not resetting
the cpumask if it's a subset of the new_mask. Also, shouldn't this check
p->cpus_mask instead of p->cpus_ptr?

Thanks.

--
tejun