Re: [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask()

From: Tejun Heo
Date: Wed Nov 23 2022 - 12:05:14 EST


On Wed, Nov 23, 2022 at 08:21:57AM +0000, haifeng.xu wrote:
> When change the 'cpuset.mems' under some cgroup, system will hung
> for a long time. From the dmesg, many processes or theads are
> stuck in fork/exit. The reason is show as follows.
>
> thread A:
> cpuset_write_resmask /* takes cpuset_rwsem */
> ...
> update_tasks_nodemask
> mpol_rebind_mm /* waits mmap_lock */
>
> thread B:
> worker_thread
> ...
> cpuset_migrate_mm_workfn
> do_migrate_pages /* takes mmap_lock */
>
> thread C:
> cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */
> ...
> cpuset_can_attach
> percpu_down_write /* waits cpuset_rwsem */
>
> Once update the nodemasks of cpuset, thread A wakes up thread B to
> migrate mm. But when thread A iterates through all tasks, including
> child threads and group leader, it has to wait the mmap_lock which
> has been take by thread B. Unfortunately, thread C wants to migrate
> tasks into cgroup at this moment, it must wait thread A to release
> cpuset_rwsem. If thread B spends much time to migrate mm, the
> fork/exit which acquire cgroup_threadgroup_rwsem also need to
> wait for a long time.
>
> There is no need to migrate the mm of child threads which is
> shared with group leader.

This is only a problem in cgroup1 and cgroup1 doesn't require the threads of
a given task to be in the same cgroup. I don't think you can optimize it
this way.

Thanks.

--
tejun