Re: [PATCH v4.11] cgroup, net_cls: iterate the fds of only the tasks which are being migrated

From: David Miller
Date: Wed Mar 22 2017 - 13:33:53 EST


From: Tejun Heo <tj@xxxxxxxxxx>
Date: Tue, 14 Mar 2017 19:25:56 -0400

> The net_cls controller controls the classid field of each socket which
> is associated with the cgroup. Because the classid is per-socket
> attribute, when a task migrates to another cgroup or the configured
> classid of the cgroup changes, the controller needs to walk all
> sockets and update the classid value, which was implemented by
> 3b13758f51de ("cgroups: Allow dynamically changing net_classid").
>
> While the approach is not scalable, migrating tasks which have a lot
> of fds attached to them is rare and the cost is born by the ones
> initiating the operations. However, for simplicity, both the
> migration and classid config change paths call update_classid() which
> scans all fds of all tasks in the target css. This is an overkill for
> the migration path which only needs to cover a much smaller subset of
> tasks which are actually getting migrated in.
>
> On cgroup v1, this can lead to unexpected scalability issues when one
> tries to migrate a task or process into a net_cls cgroup which already
> contains a lot of fds. Even if the migration traget doesn't have many
> to get scanned, update_classid() ends up scanning all fds in the
> target cgroup which can be extremely numerous.
>
> Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
> even worse. Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only
> for subsystems which are actually affected by migration"), cgroup core
> would call the ->css_attach callback even for controllers which don't
> see actual migration to a different css.
>
> As net_cls is always disabled but still mounted on cgroup v2, whenever
> a process is migrated on the cgroup v2 hierarchy, net_cls sees
> identity migration from root to root and cgroup core used to call
> ->css_attach callback for those. The net_cls ->css_attach ends up
> calling update_classid() on the root net_cls css to which all
> processes on the system belong to as the controller isn't used. This
> makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
> which is horrible and easily leads to noticeable stalls triggering RCU
> stall warnings and so on.
>
> The worst symptom is already fixed in upstream by bfc2cf6f61fc
> ("cgroup: call subsys->*attach() only for subsystems which are
> actually affected by migration"); however, backporting that commit is
> too invasive and we want to avoid other cases too.
>
> This patch updates net_cls's cgrp_attach() to iterate fds of only the
> processes which are actually getting migrated. This removes the
> surprising migration cost which is dependent on the total number of
> fds in the target cgroup. As this leaves write_classid() the only
> user of update_classid(), open-code the helper into write_classid().
>
> Reported-by: David Goode <dgoode@xxxxxx>
> Fixes: 3b13758f51de ("cgroups: Allow dynamically changing net_classid")
> Cc: stable@xxxxxxxxxxxxxxx # v4.4+
> Cc: Nina Schiff <ninasc@xxxxxx>
> Cc: David S. Miller <davem@xxxxxxxxxxxxx>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>

Applied, thanks.