Re: [RFC PATCH] cgroup: Return error when attempting to migrate a zombie process

From: Tejun Heo
Date: Fri May 05 2023 - 15:04:52 EST


Hello,

On Wed, May 03, 2023 at 02:53:59PM +0200, Michal Koutný wrote:
> Zombies aren't migrated. However, return value of a migration write may
> suggest a zombie process was migrated and causing confusion about lack
> of cgroup.events:populated between origin and target cgroups (e.g.
> target cgroup rmdir).
>
> Notify the users about no effect of their action by a return value.
> (update_dfl_csses migration of zombies still silently passes since it is
> not meant to be user-visible migration anyway.)
>
> Suggested-by: Benjamin Berg <benjamin@xxxxxxxxxxxxxxxx>
> Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx>
> ---
> kernel/cgroup/cgroup.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Reasons for RFC:
> 1) Some users may notice the change,
> 2) EINVAL vs ESCHR,
> 3) add a selftest?
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 625d7483951c..306547dd7b76 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2968,7 +2968,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> * become trapped in a cpuset, or RT kthread may be born in a
> * cgroup with no rt_runtime allocated. Just say no.
> */
> - if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
> + if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY) ||
> + !atomic_read(&tsk->signal->live)) {

This seems racy to me. The liveness state can change between here and the
PF_EXITING check in cgroup_migrate_add_task(), right? Wouldn't it be better
to just track how many tasks are tracked and return -ESRCH if none was
migrated?

Thanks.

--
tejun