Re: NUMA aware slab allocator V3

From: Martin J. Bligh
Date: Mon May 16 2005 - 13:15:04 EST


>> > How do the concepts of numa node id relate to discontig node ids?
>>
>> I believe there are quite a few assumptions on some architectures that,
>> when NUMA is on, they are equivalent. It appears to be pretty much
>> assumed everywhere that CONFIG_NUMA=y means one pg_data_t per NUMA node.
>
> Ah. That sounds much better.
>
>> Remember, as you saw, you can't assume that MAX_NUMNODES=1 when NUMA=n
>> because of the DISCONTIG=y case.
>
> I have never seen such a machine. A SMP machine with multiple
> "nodes"? So essentially one NUMA node has multiple discontig "nodes"?

I believe you (SGI) make one ;-) Anywhere where you have large gaps in
the physical address range within a node, this is what you really need.
Except ia64 has this wierd virtual mem_map thing that can go away once
we have sparsemem.

> This means that the concept of a node suddenly changes if there is just
> one numa node(CONFIG_NUMA off implies one numa node)?

The end point of where we're getting to is 1 node = 1 pgdat (which we can
then rename to struct node or something). All this confusing mess of
config options is just a migration path, which I'll leave it to Andy to
explain ;-)

> s/CONFIG_NUMA/CONFIG_NEED_MULTIPLE_NODES?
>
> That will not work because the idea is the localize the slabs to each
> node.
>
> If there are multiple nodes per numa node then invariable one node in the
> numa node (sorry for this duplication of what node means but I did not
> do it) must be preferred since numa_node_id() does not return a set of
> discontig nodes.
>
> Sorry but this all sounds like an flaw in the design. There is no
> consistent notion of node. Are you sure that this is not a ppc64 screwup?

No, it's a discontigmem screwup. Currently a pgdat represents 2 different
scenarios:

(1) physically discontiguous memory chunk.
(2) a NUMA node.

I don't think we support both at the same time with the old code. So it
seems to me like your numa aware slab code (which I'm still intending to
go read, but haven't yet) is only interested in real nodes. Logically
speaking, that would be CONFIG_NUMA. The current transition config options
are a bit of a mess ... Andy, I presume CONFIG_NEED_MULTIPLE_NODES is
really CONFIG_NEED_MULTIPLE_PGDATS ?

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/