[NUMA , x86_64] Why memnode_shift is chosen with the lowest possiblevalue ?

From: Eric Dumazet
Date: Thu Sep 29 2005 - 08:41:35 EST


Hi Andi

I have a dual Opteron machine, with 8GB of ram on each node.
With latest kernels I have high CPU profiles in mm/slab.c
kfree() is NUMA aware, so far so good, but the price seems heavy.

I noticed in 2.6.14-rc2 syslog :

Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 23 for the hash shift. Max adder is 3ffffffff

instead of previous (2.6.13) :

Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
Using 27 for the hash shift. Max adder is 3ffffffff

After some code review, I see NODEMAPSIZE raised from 0xff to 0xfff

phys_to_nid() is now reading one byte out of 2048 bytes with (memnode_shift=23, units of 8MB).

But shouldnt we try to use the highest possible value for memnode_shift ?

Using memnode_shift=33 would access only 2 bytes from this memnodemap[], touching fewer cache lines (well , one cache line). kfree() and friends would be slightly faster, at least cache friendly.


Another question is :

Could we add in pda (struct x8664_pda) the node of the cpu ?

We currently do :

#define numa_node_id() (cpu_to_node(raw_smp_processor_id()))

Instead of reading the processor_id from pda, then access cpu_to_node[], we could directly get this information from pda.

#if defined(CONFIG_NUMA)
static inline __attribute_pure__ int numa_node_id() { return read_pda(node);}
#else
#define numa_node_id() 0
#endif

Thank you

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/