Re: [External] Re: [PATCH] cgroup/rstat: record the cumulative per-cpu time of cgroup and its descendants

From: Tejun Heo
Date: Thu Jul 27 2023 - 13:44:34 EST


On Thu, Jul 27, 2023 at 08:05:44PM +0800, Hao Jia wrote:
>
>
> On 2023/7/19 Hao Jia wrote:
> >
> >
> > On 2023/7/19 Tejun Heo wrote:
> > > On Tue, Jul 18, 2023 at 06:08:50PM +0800, Hao Jia wrote:
> > > > https://github.com/jiaozhouxiaojia/cgv2-stat-percpu_test/tree/main
> > >
> > > So, we run `stress -c 1` for 1 second in the asdf/test0 cgroup and
> > > asdf/cpu.stat correctly reports the cumulative usage. After removing
> > > asdf/test0 cgroup, asdf's usage_usec is still there. What's missing here?
> >
> > Sorry, some of my expressions may have misled you.
> >
> > Yes, cpu.stat will display the cumulative **global** cpu time of the
> > cgroup and its descendants (the corresponding kernel variable is
> > "cgrp->bstat"), and it will not be lost when the child cgroup is
> > removed.
> >
> > Similarly, we need a **per-cpu** variable to record the accumulated
> > per-cpu time of cgroup and its descendants.
> > The existing kernel variable "cgroup_rstat_cpu(cgrp, cpu)->bstat" is not
> > satisfied, it only records the per-cpu time of cgroup itself,
> > So I try to add "cgroup_rstat_cpu(cgrp, cpu)->cumul_bstat" to record
> > per-cpu time of cgroup and its descendants.
> >
> > In order to verify the correctness of my patch, I wrote a kernel module
> > to compare the results of calculating the per-cpu time of cgroup and its
> > descendants in two ways:
> >   Method 1. Traverse and add the per-cpu rstatc->bstat of cgroup and
> > each of its descendants.
> >   Method 2. Directly read "cgroup_rstat_cpu(cgrp, cpu)->cumul_bstat" in
> > the kernel.
> >
> > When the child cgroup is not removed, the results calculated by the two
> > methods should be equal.
> >
> > > What are you adding?
> > I want to add a **per-cpu variable** to record the cumulative per-cpu
> > time of cgroup and its descendants, which is similar to the variable
> > "cgrp->bstat", but it is a per-cpu variable.
> > It is very useful and convenient for calculating the usage of cgroup on
> > each cpu, and its behavior is similar to the "cpuacct.usage*" interface
> > of cgroup v1.
> >
>
> Hello Tejun,
>
> I don't know if I explained it clearly, and do you understand what I mean?
>
> Would you mind adding a variable like this to facilitate per-cpu usage
> calculations and migration from cgroup v1 to cgroup v2?

Oh yeah, I do. I'm just thinking whether we also want to expose that in the
cgroupfs. We are currently not showing anything per-cpu and the output
formatting gets nasty with a huge number of CPUs, so maybe that's not going
to work out all that well. Anyways, I'll get back to you next week.

Thanks.

--
tejun