Re: [PATCH] cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting

From: Tejun Heo
Date: Thu Jan 31 2019 - 09:56:26 EST


On Mon, Jan 28, 2019 at 05:00:13PM +0100, Oleg Nesterov wrote:
> The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which
> needs pids_free() to uncharge the pid.
>
> However, ->free() is called from __put_task_struct()->cgroup_free() and this
> is too late. Even the trivial program which does
>
> for (;;) {
> int pid = fork();
> assert(pid >= 0);
> if (pid)
> wait(NULL);
> else
> exit(0);
> }
>
> can run out of limits because release_task()->call_rcu(delayed_put_task_struct)
> implies an RCU gp after the task/pid goes away and before the final put().
>
> Test-case:
>
> mkdir -p /tmp/CG
> mount -t cgroup2 none /tmp/CG
> echo '+pids' > /tmp/CG/cgroup.subtree_control
>
> mkdir /tmp/CG/PID
> echo 2 > /tmp/CG/PID/pids.max
>
> perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' &
> echo $! > /tmp/CG/PID/cgroup.procs
>
> Without this patch the forking process fails soon after migration.
>
> Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite
> into the new helper, cgroup_release(), called by release_task() which actually
> frees the pid(s).
>
> Reported-by: Herton R. Krzesinski <hkrzesin@xxxxxxxxxx>
> Reported-by: Jan Stancek <jstancek@xxxxxxxxxx>
> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>

Applied to cgroup/for-5.0.

Thanks, Oleg.

--
tejun