Re: [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling

From: Yinghai Lu
Date: Wed Mar 02 2011 - 16:37:17 EST


On 03/02/2011 01:12 PM, Yinghai Lu wrote:
> On 03/02/2011 07:42 AM, Tejun Heo wrote:
>> Hey,
>>
>> On Wed, Mar 02, 2011 at 06:30:59AM -0800, David Rientjes wrote:
>>> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
>>>
>>> There's also this in numa_emulation() that isn't a safe assumption:
>>>
>>> /* make sure all emulated nodes are mapped to a physical node */
>>> for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++)
>>> if (emu_nid_to_phys[i] == NUMA_NO_NODE)
>>> emu_nid_to_phys[i] = 0;
>>>
>>> Node id 0 is not always online depending on how you setup your SRAT. I'm
>>> not sure why emu_nid_to_phys[] would ever map a fake node id that doesn't
>>> exist to a physical node id rather than NUMA_NO_NODE, so I think it can
>>> just be removed. Otherwise, it should be mapped to a physical node id
>>> that is known to be online.
>>
>> Unless I screwed up, that behavior isn't new. It just put in a
>> different form. Looking through the code... Okay, I think node 0
>> always exists. SRAT PXM isn't used as node number directly. It goes
>> through acpi_map_pxm_to_node() which allocates nids from 0 up.
>> amdtopology also guarantees the existence of node 0, so I think we're
>> in the safe and that probably is the reason why we had the above
>> behavior in the first place.
>>
>> IIRC, there are other places which assume the existence of node 0.
>> Whether it's a good idea or not, I'm not sure but requring node 0 to
>> be always allocated doesn't sound too wrong to me. Maybe we can add
>> BUG_ON() if node 0 is offline somewhere.
>
>
> When first socket does not have memory, we will not node 0 online.
> and cpu_to_node() will have those cpus round to near node like node1 or node7.
>
> BTW: this conf get broken several times, and get fixed several times.

david,

it looks like numa emu does not support that conf already.

old code:
void __cpuinit numa_add_cpu(int cpu)
{
unsigned long addr;
u16 apicid;
int physnid;
int nid = NUMA_NO_NODE;

apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
if (apicid != BAD_APICID)
nid = apicid_to_node[apicid];
if (nid == NUMA_NO_NODE)
nid = early_cpu_to_node(cpu);
BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));


current code:
void __cpuinit numa_add_cpu(int cpu)
{
int physnid, nid;

nid = numa_cpu_node(cpu);
if (nid == NUMA_NO_NODE)
nid = early_cpu_to_node(cpu);
BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));

physnid = emu_nid_to_phys[nid];

/*
* Map the cpu to each emulated node that is allocated on the physical
* node of the cpu's apic id.
*/
for_each_online_node(nid)
if (emu_nid_to_phys[nid] == physnid)
cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
}


please note numa_cpu_node or old code will return nid that is node 0, and even node0 does not mem and not onlined.

maybe we can just change to nid = cpu_to_node() to get nodeid that is onlined.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/