Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counter batch values for stats counters

From: KOSAKI Motohiro
Date: Tue May 12 2009 - 06:44:55 EST


>
> * Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>
> > * KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> [2009-05-12 19:13:42]:
> >
> > > > > +#ifdef CONFIGCONFIG_SMP
> > > > > + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> > > > > +#endif
> > > >
> > > > Slow down and compile patches before sending them out.. please. That
> > > > is a basic expectation if you expect it to be merged.
> > >
> > > Unfortunately, this mistake pass test successfully ;)
> > > it because cpuacct_batch=0 works even SMP.
> > >
> >
> > OK, BTW, using an #ifdef right in the middle of a function makes
> > the code harder to read, can't we use an inline function to
> > abstract out SMP?
>
> or rather, to make cpuacct_batch have a sane value on UP too. (1?
> 0?)

umm..

I've reviewed my patch again.
but sched_init() already has multiple #ifdef SMP. Thus I don't think
cosmetic changing improve readability largely.



------------------------------------
Subject: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters

percpu counters used to accumulate statistics in cpuacct controller use
the default batch value [max(2*nr_cpus, 32)] which can be too small for
archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
cputime updates in the range of thousands. As a result, cpuacct_update_stats()
would end up acquiring the percpu counter spinlock on every tick which
is not good for performance.

Let those architectures to have a bigger batch threshold so that percpu counter
spinlock isn't taken on every tick. This change doesn't affect the archs which
don't define VIRT_CPU_ACCOUNTING and they continue to have the default
percpu counter batch value.

v7:
- fix typo and changelog

v6:
- fix build error when UP

v5:
- move cpuacct_batch initialization into sched_init()

v4:
- rewrite patch description (thanks Bharata!)
- append read_mostly to cpuacct_batch
- cpuacct_batch is initialized by sched_init_debug()

v3:
- revert using percpu_counter_sum()

v2:
- use percpu_counter_sum() instead percpu_counter_read()

Cc: Balaji Rao <balajirrao@xxxxxxxxx>
Cc: Dhaval Giani <dhaval@xxxxxxxxxxxxxxxxxx>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
---
kernel/sched.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

Index: b/kernel/sched.c
===================================================================
--- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
+++ b/kernel/sched.c 2009-05-12 19:04:49.000000000 +0900
@@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
*/
int sysctl_sched_rt_runtime = 950000;

+static __read_mostly s32 cpuacct_batch;
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
@@ -9284,6 +9286,10 @@ void __init sched_init(void)

perf_counter_init();

+#ifdef CONFIG_SMP
+ cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
+#endif
+
scheduler_running = 1;
}

@@ -10457,7 +10463,8 @@ static void cpuacct_update_stats(struct
ca = task_ca(tsk);

do {
- percpu_counter_add(&ca->cpustat[idx], val);
+ __percpu_counter_add(&ca->cpustat[idx], val, cpuacct_batch);
+
ca = ca->parent;
} while (ca);
rcu_read_unlock();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/