RE: [Lse-tech] Re: [RFC] Dynamic percpu data allocator

From: BALBIR SINGH (balbir.singh@wipro.com)
Date: Fri May 24 2002 - 06:59:28 EST


Thanks, when I ment CPU local memory on SMP, I ment CPU cache.
Sorry for the confusion. I think your dynamic allocator makes
sense.

Balbir

|
|On Fri, May 24, 2002 at 02:08:50PM +0530, BALBIR SINGH wrote:
|>
|> Sure, I understand what you are talking about now. That makes a lot
|> of sense, I will go through your document once more and read it.
|> I was thinking of the two combined (allocating CPU local memory
|> for certain data structs also includes allocating one copy per CPU).
|> Is there a reason to delay the implementation of CPU local memory,
|> or are we waiting for NUMA guys to do it? Is it not useful in an
|> SMP system to allocate CPU local memory?
|
|In an SMP system, the entire memory is equidistant from the CPUs.
|So, any memory that is exclusively accessed by once cpu only
|is CPU-local. On a NUMA machine however that isn't true, so
|you need special schemes.
|
|The thing about one-copy-per-cpu allocator that I describe is that
|it interleaves per-cpu data to save on space. That is if you
|allocate per-cpu ints i1, i2, it will be laid out in memory like this -
|
| CPU #0 CPU#1
|
| --------- --------- Start of cache line
| i1 i1
| i2 i2
|
| . .
| . .
| . .
| . .
| . .
|
| --------- ---------- End of cache line
|
|The per-cpu copies of i1 and i2 for CPU #0 and CPU #1 are allocated from
|different cache lines of memory, but copy of i1 and i2 for CPU #0 are
|in the same cache line. This interleaving saves space by avoiding
|the need to pad small data structures to cache line sizes.
|This essentially how the static per-cpu data area in 2.5 kernel
|is laid out in memory. Since copies for CPU #0 and CPU #1 for
|the same variable are on different cache lines, assuming that
|code that accesses "this" CPU's copy will not result in cache line
|bouncing. On an SMP machine, I can allocate the cache lines
|for different CPUs, where the interleaved data structures are
|laid out, using the slab allocator. On a NUMA machine however,
|I would want to make sure that cache line allocated for this
|purpose for CPU #N is closest possible to CPU #N.
|
|
|Thanks
|--
|Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net
|Linux Technology Center, IBM Software Lab, Bangalore, India.
|-
|To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
|the body of a message to majordomo@vger.kernel.org
|More majordomo info at http://vger.kernel.org/majordomo-info.html
|Please read the FAQ at http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri May 31 2002 - 22:00:12 EST