Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting

From: Peter Zijlstra
Date: Tue Aug 29 2017 - 10:33:20 EST



So I mostly like. On accounting it only adds to the immediate cgroup (if
it has a parent, aka !root).

On update it does a DFS of all sub-groups and propagates the deltas up
to the requested group.

On Fri, Aug 11, 2017 at 09:37:54AM -0700, Tejun Heo wrote:
> +/**
> + * cgroup_cpu_stat_updated - keep track of updated cpu_stat
> + * @cgrp: target cgroup
> + * @cpu: cpu on which cpu_stat was updated
> + *
> + * @cgrp's cpu_stat on @cpu was updated. Put it on the parent's matching
> + * cpu_stat->updated_children list.
> + */
> +static void cgroup_cpu_stat_updated(struct cgroup *cgrp, int cpu)
> +{
> + raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_cpu_stat_lock, cpu);
> + struct cgroup *parent;
> + unsigned long flags;
> +
> + /*
> + * Speculative already-on-list test. This may race leading to
> + * temporary inaccuracies, which is fine.
> + *
> + * Because @parent's updated_children is terminated with @parent
> + * instead of NULL, we can tell whether @cgrp is on the list by
> + * testing the next pointer for NULL.
> + */
> + if (cgroup_cpu_stat(cgrp, cpu)->updated_next)
> + return;
> +
> + raw_spin_lock_irqsave(cpu_lock, flags);
> +
> + /* put @cgrp and all ancestors on the corresponding updated lists */
> + for (parent = cgroup_parent(cgrp); parent;
> + cgrp = parent, parent = cgroup_parent(cgrp)) {
> + struct cgroup_cpu_stat *cstat = cgroup_cpu_stat(cgrp, cpu);
> + struct cgroup_cpu_stat *pcstat = cgroup_cpu_stat(parent, cpu);
> +
> + /*
> + * Both additions and removals are bottom-up. If a cgroup
> + * is already in the tree, all ancestors are.
> + */
> + if (cstat->updated_next)
> + break;
> +
> + cstat->updated_next = pcstat->updated_children;
> + pcstat->updated_children = cgrp;
> + }
> +
> + raw_spin_unlock_irqrestore(cpu_lock, flags);
> +}

> +static struct cgroup_cpu_stat *cgroup_cpu_stat_account_begin(struct cgroup *cgrp)
> +{
> + struct cgroup_cpu_stat *cstat;
> +
> + cstat = get_cpu_ptr(cgrp->cpu_stat);
> + u64_stats_update_begin(&cstat->sync);
> + return cstat;
> +}
> +
> +static void cgroup_cpu_stat_account_end(struct cgroup *cgrp,
> + struct cgroup_cpu_stat *cstat)
> +{
> + u64_stats_update_end(&cstat->sync);
> + cgroup_cpu_stat_updated(cgrp, smp_processor_id());
> + put_cpu_ptr(cstat);
> +}
> +
> +void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec)
> +{
> + struct cgroup_cpu_stat *cstat;
> +
> + cstat = cgroup_cpu_stat_account_begin(cgrp);
> + cstat->cputime.sum_exec_runtime += delta_exec;
> + cgroup_cpu_stat_account_end(cgrp, cstat);
> +}

What I don't get is why you need cgroup_cpu_stat_updated(). That is, I
see you use it to keep the keep the DFS 'stack' up-to-date, but what I
don't see is why you'd need that.

Have a look at walk_tg_tree_from(), I think we can do something like
that on struct cgroup_subsys_state, it has that children list and the
parent pointer.

And yes, walk_tg_tree_from() is tricky, it always takes a fair while to
remember how it works.