Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

From: Tejun Heo
Date: Thu Apr 10 2014 - 09:08:59 EST


Hey, Serge.

On Thu, Apr 10, 2014 at 05:08:55AM +0200, Serge E. Hallyn wrote:
> Quoting Tejun Heo (tj@xxxxxxxxxx):
> > * It delivers events by forking and execing a userland binary
> > specified as the release_agent. This is a long deprecated method of
> > notification delivery. It's extremely heavy, slow and cumbersome to
> > integrate with larger infrastructure.
>
> (Not seriously worried about this, but it's a point worth considering)
> It does have one advantage though: if the userspace agent goes bad,
> cgroups can still be removed on empty.
>
> Do you plan on keeping release-on-empty around? I assume only for a
> while?

The new mechanism is only for the unified hierarchy. The old one will
be kept around for other hierarchies.

> Do you think there is any value in having a simpler "remove-when-empty"
> file? Doesn't call out to userspace, just drops the cgroup when there
> are no more tasks or sub-cgroups?

I don't think so. Implementing such simplistic mechanism in userland
is trivial and even independent failover mechanisms can be easily
implemented from userland as multiple entities can set up watches. I
don't think there's much value in providing another mechanism from
kernel side. The only reason why release_agent thing got as complex
as it is is because the mechanism is fundamentally flawed - clumsy
delivery, no multiple watches, single watch point - so people tried to
work around it by adding event filtering from kernel side, which is
quite backwards IMHO. With proper event mechanism, everything should
be easily achievable from userland side.

> > * Events are filtered from the kernel side. "notify_on_release" file
> > is used to subscribe to or suppres release event and events are not
> > generated if a cgroup becomes empty by moving the last task out of
> > it; however, event is generated if it becomes empty because the last
> > child cgroup is removed. This is inconsistent, awkward and
>
> Hm, maybe I'm misreading but this doesn't seem right. If I move
> a task into x1 and kill the task, x1 goes away. Likewise if I
> create x1/y1, and rmdir y1, x1 goes away. I suspect I'm misunderstanding
> the case in which you say it doesn't happen?

The case where you move a task out of x1/y1 to another cgroup doesn't
generate an event. One could say that that's unnecessary because the
mover knows that the cgroup is becoming empty; however, it excludes
any cases where there are more than one actors and the same can be
said for cases when the actor is removing a child.

> > This patch implements interface file "cgroup.subtree_populated" which
> > can be used to monitor whether the cgroup's subhierarchy has tasks in
> > it or not. Its value is 1 if there is no task in the cgroup and its
>
> I think you meant this backward? It's 1 if there is *any task in
> the cgroup and its descendants, else 0?

Oops, yeap. Will update.

Thanks!

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/