[PATCH] better align percpu counter (Was Re: [tip:sched/core]sched: cpuacct: Use bigger percpu counter batch values for stats counters

From: KAMEZAWA Hiroyuki
Date: Thu Aug 20 2009 - 04:43:26 EST


On Thu, 20 Aug 2009 16:24:51 +1000
Anton Blanchard <anton@xxxxxxxxx> wrote:

>
> Hi,
>
> > Could you share contex-switch-test program ?
> > I'd like to play with it to find out what I can do against percpu counter.
>
> Sure:
>
> http://ozlabs.org/~anton/junkcode/context_switch.c
>
> Very simple, just run it once per core:
>
> for i in `seq 0 31`
> do
> taskset -c $i ./context_switch &
> done
>
> Then look at the context switch rates in vmstat.
>
Thank you for test program.

Before adjusting batch counter (I think you should modify it),
Could you try this ?

I only have 8cpu(2socket) host but works well.
(But...my host is x86-64 and has not virt-cpu-accouting.)

with your program
before patch.
cpuacct off : 414000-416000 ctsw per sec.
cpuacct on : 401000-404000 ctsw per sec.

after patch
cpuacct on : 412000-413000 ctsw per sec.

Maybe I should check cache-miss late ;)
==
It's bad to place pointer for array of per-cpu-data on the
same cache line of spinlock. This patch moves percpu_counter's
cacheline to reduce false sharing.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
---
include/linux/percpu_counter.h | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-2.6.31-rc6/include/linux/percpu_counter.h
===================================================================
--- linux-2.6.31-rc6.orig/include/linux/percpu_counter.h 2009-08-20 12:09:27.000000000 +0900
+++ linux-2.6.31-rc6/include/linux/percpu_counter.h 2009-08-20 17:31:13.000000000 +0900
@@ -14,14 +14,24 @@
#include <linux/types.h>

#ifdef CONFIG_SMP
+struct __percpu_counter_padding {
+ char x[0];
+} ____cacheline_internodealigned_in_smp;
+#define CACHELINE_PADDING(name) struct __percpu_counter_padding name

struct percpu_counter {
+ /*
+ * This pointer is persistent and accessed firstly.
+ * Then, should not be purged by locking in other cpus.
+ */
+ s32 *counters;
+ CACHELINE_PADDING(pad);
spinlock_t lock;
s64 count;
#ifdef CONFIG_HOTPLUG_CPU
+ /* rarely accessed field */
struct list_head list; /* All percpu_counters are on a list */
#endif
- s32 *counters;
};

extern int percpu_counter_batch;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/