Re: [patch 3/4] net: Percpufy frequently used variables --proto.sockets_allocated

From: Andrew Morton
Date: Fri Jan 27 2006 - 19:22:43 EST


Ravikiran G Thirumalai <kiran@xxxxxxxxxxxx> wrote:
>
> On Fri, Jan 27, 2006 at 03:08:47PM -0800, Andrew Morton wrote:
> > Andrew Morton <akpm@xxxxxxxx> wrote:
> > >
> > > Oh, and because vm_acct_memory() is counting a singleton object, it can use
> > > DEFINE_PER_CPU rather than alloc_percpu(), so it saves on a bit of kmalloc
> > > overhead.
> >
> > Actually, I don't think that's true. we're allocating a sizeof(long) with
> > kmalloc_node() so there shouldn't be memory wastage.
>
> Oh yeah there is. Each dynamic per-cpu object would have been atleast
> (NR_CPUS * sizeof (void *) + num_cpus_possible * cacheline_size ).
> Now kmalloc_node will fall back on size-32 for allocation of long, so
> replace the cacheline_size above with 32 -- which then means dynamic per-cpu
> data are not on a cacheline boundary anymore (most modern cpus have 64byte/128
> byte cache lines) which means per-cpu data could end up false shared....
>

OK. But isn't the core of the problem the fact that __alloc_percpu() is
using kmalloc_node() rather than a (new, as-yet-unimplemented)
kmalloc_cpu()? kmalloc_cpu() wouldn't need the L1 cache alignment.

It might be worth creating just a small number of per-cpu slabs (4-byte,
8-byte). A kmalloc_cpu() would just need a per-cpu array of
kmem_cache_t*'s and it'd internally use kmalloc_node(cpu_to_node), no?

Or we could just give __alloc_percpu() a custom, hand-rolled,
not-cacheline-padded sizeof(long) slab per CPU and use that if (size ==
sizeof(long)). Or something.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/