Re: #tj-percpu has been rebased

From: H. Peter Anvin
Date: Fri Feb 13 2009 - 20:57:21 EST


Tejun Heo wrote:

Percpu areas are allocated in chunks in vmalloc area. Each chunk is
consisted of num_possible_cpus() units and the first chunk is used for
static percpu variables in the kernel image (special boot time
alloc/init handling necessary as these areas need to be brought up
before allocation services are running). Unit grows as necessary and
all units grow or shrink in unison. When a chunk is filled up,
another chunk is allocated. ie. in vmalloc area

c0 c1 c2 ------------------- ------------------- ------------
| u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u
------------------- ...... ------------------- .... ------------

Allocation is done in offset-size areas of single unit space. Ie,
when UNIT_SIZE is 128k, an area at 134k of 512bytes occupy 512bytes at
6k of c1:u0, c1:u1, c1:u2 and c1u3. Percpu access can be done by
configuring percpu base registers UNIT_SIZE apart.


Okay, let's think about this a bit.

At least for x86, there are two cases:

- 32 bits. The vmalloc area is *extremely* constrained, and has the same class of fragmentation issues as main memory. In fact, it might have *more* just by virtue of being larger.

- 64 bits. At this point, we have with current memory sizes(*) an astronomically large virtual space. Here we have no real problem allocating linearly in virtual space, either by giving each CPU some very large hunk of virtual address space (which means each percpu area is contiguous in virtual space) or by doing large contiguous allocations out of another range.

It doesn't seem to make sense to me at first glance to be any advantage to interlacing the CPUs. Quite on the contrary, it seems to utterly preclude ever doing PMDs with a win, since (a) you'd be allocating real memory for CPUs which aren't actually there and (b) you'd have the wrong NUMA associativity.

-hpa


(*) In about 20 years we better get the remaining virtual address bits...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/