Re: [RFC PATCH] cpuacct: per-cgroup utime/stime statistics - v1

From: KAMEZAWA Hiroyuki
Date: Wed Mar 11 2009 - 05:14:43 EST


On Wed, 11 Mar 2009 14:23:16 +0530
Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Mar 11, 2009 at 09:38:12AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 10 Mar 2009 18:12:08 +0530
> > Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > Hi,
> > >
> > > Based on the comments received during my last post
> > > (http://lkml.org/lkml/2009/2/25/129), here is a fresh attempt
> > > to get per-cgroup utime/stime statistics as part of cpuacct controller.
> > >
> > > This patch adds a new file cpuacct.stat which displays two stats:
> > > utime and stime. I wasn't too sure about the usefulness of providing
> > > per-cgroup guest and steal times and hence not including them here.
> > >
> > > Note that I am using percpu_counter for collecting these two stats.
> > > Since percpu_counter subsystem doesn't protect the readside, readers could
> > > theoritically obtain incorrect values for these stats on 32bit systems.
> >
> > Using percpu_counter_read() means that .. but is it okay to ignore "batch"
> > number ? (see FBC_BATCH)
>
> I would think it might be ok with the understanding that read is not
> a frequent operation. The default value of percpu_counter_batch is 32.
> Ideally it should have been possible to set this value independently
> for each percpu_counter. That way, users could have chosen an appropriate
> batch value for their counter based on the usage pattern of their
> counters.
>
Hmm, in my point of view, stime/utime's unit is mili second and it's enough
big to be expected as "correct" value.
If read is not frequent, I love precise value.


> >
> >
> > > I hope occasional wrong values is not too much of a concern for
> > > statistics like this. If it is a problem, we have to either fix
> > > percpu_counter or do it all by ourselves as Kamezawa attempted
> > > for cpuacct.usage (http://lkml.org/lkml/2009/3/4/14)
> > >
> > Hmm, percpu_counter_sum() is bad ?
>
> It is slow and it doesn't do exactly what we want. It just adds the
> 32bit percpu counters to the global 64bit counter under lock and returns
> the result. But it doesn't clear the 32bit percpu counters after accummulating
> them in the 64bit counter.
>
> If it is ok to be a bit slower on the read side, we could have something
> like percpu_counter_read_slow() which would do what percpu_counter_sum()
> does and in addition clear the 32bit percpu counters. Will this be
> acceptable ? It slows down the read side, but will give accurate count.
> This might slow down the write side also(due to contention b/n readers
> and writers), but I guess due to batching the effect might not be too
> pronounced. Should we be going this way ?
>
I like precise one. Maybe measuring overhead and comparing them and making a
decision is a usual way to go.
This accounting is once-a-tick event. (right?) So, how about measuring read-side
over head ?

> >
> > BTW, I'm not sure but don't we need special handling if
> > CONFIG_VIRT_CPU_ACCOUNTING=y ?
>
> AFAICS no. Architectures which define CONFIG_VIRT_CPU_ACCOUNTING end up calling
> account_{system,user}_time() where we have placed our hooks for
> cpuacct charging. So even on such architectures we should be able to
> get correct per-cgroup stime and utime.
>
ok,

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/