Re: [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events

From: Peter Zijlstra
Date: Sat Feb 04 2023 - 04:40:53 EST


On Thu, Feb 02, 2023 at 09:32:00AM -0500, Waiman Long wrote:
> It is a known issue that when a task is in a non-root v1 cpuset, a cpu
> offline event will cause that cpu to be lost from the task's cpumask
> permanently as the cpuset's cpus_allowed mask won't get back that cpu
> when it becomes online again. A possible workaround for this type of
> cpu offline/online sequence is to leave the offline cpu in the task's
> cpumask and do the update only if new cpus are added. It also has the
> benefit of reducing the overhead of a cpu offline event.
>
> Note that the scheduler is able to ignore the offline cpus and so
> leaving offline cpus in the cpumask won't do any harm.
>
> Now with v2, only the cpu online events will cause a call to
> hotplug_update_tasks() to update the tasks' cpumasks. For tasks
> in a non-root v1 cpuset, the situation is a bit different. The cpu
> offline event will not cause change to a task's cpumask. Neither does a
> subsequent cpu online event because "cpuset.cpus" had that offline cpu
> removed and its subsequent onlining won't be registered as a change
> to the cpuset. An exception is when all the cpus in the original
> "cpuset.cpus" have gone offline once. In that case, "cpuset.cpus" will
> become empty which will force task migration to its parent. A task's
> cpumask will also be changed if set_cpus_allowed_ptr() is somehow called
> for whatever reason.
>
> Of course, this patch can cause a discrepancy between v1's "cpuset.cpus"
> and and its tasks' cpumasks. Howver, it can also largely work around
> the offline cpu losing problem with v1 cpuset.

I don't thikn you can simply not update on offline, even if
effective_cpus doesn't go empty, because the intersection between
task_cpu_possible_mask() and effective_cpus might have gone empty.

Very fundamentally, the introduction of task_cpu_possible_mask() means
that you now *HAVE* to always consider affinity settings per-task, you
cannot group them anymore.