Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

From: Konstantin Khlebnikov
Date: Wed Apr 29 2015 - 04:30:30 EST


+x86@xxxxxxxxxx
+linux-metag@xxxxxxxxxxxxxxx

here is proposed fix:
https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg864009.html

It returns NUMA_NO_NODE from both static-inline (CONFIG_OF=n) and weak
version of of_node_to_nid(). This change might affect few arches which
whave CONFIG_OF=y but doesn't implement of_node_to_nid() (i.e. depends
on default behavior of weak function). It seems this is only metag.

From mm/ point of view returning NUMA_NO_NODE is a right choice when
code have no idea which numa node should be used -- memory allocation
functions choose current numa node (but they might use any).

On 29.04.2015 04:11, songxiumiao@xxxxxxxxxx wrote:
When we test the cpu and memory hotplug feature in the server with x86
architecture and kernel4.0-rc4,we met the similar problem.

The situation is that when memory in node0 is offline,the system is down
during booting.

Following is the bug information:
[ 0.335176] BUG: unable to handle kernel paging request at
0000000000001b08
[ 0.342164] IP: [<ffffffff81182587>] __alloc_pages_nodemask+0xb7/0x940
[ 0.348706] PGD 0
[ 0.350735] Oops: 0000 [#1] SMP
[ 0.353993] Modules linked in:
[ 0.357063] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc4 #1
[ 0.363232] Hardware name: Inspur TS860/TS860, BIOS TS860_2.0.0
2015/03/24
[ 0.370095] task: ffff88085b1e0000 ti: ffff88085b1e8000 task.ti:
ffff88085b1e8000
[ 0.377564] RIP: 0010:[<ffffffff81182587>] [<ffffffff81182587>]
__alloc_pages_nodemask+0xb7/0x940
[ 0.386524] RSP: 0000:ffff88085b1ebac8 EFLAGS: 00010246
[ 0.391828] RAX: 0000000000001b00 RBX: 0000000000000010 RCX:
0000000000000000
[ 0.398953] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
00000000002052d0
[ 0.406075] RBP: ffff88085b1ebbb8 R08: ffff88085b13fec0 R09:
000000005b13fe01
[ 0.413198] R10: ffff88085e807300 R11: ffffffff810d4bc1 R12:
000000000001002a
[ 0.420321] R13: 00000000002052d0 R14: 0000000000000001 R15:
00000000000040d0
[ 0.427446] FS: 0000000000000000(0000) GS:ffff88085ee00000(0000)
knlGS:0000000000000000
[ 0.435522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.441259] CR2: 0000000000001b08 CR3: 00000000019ae000 CR4:
00000000001406f0
[ 0.448382] Stack:
[ 0.450392] ffff88085b1e0000 0000000000000400 ffff88085b1effff
ffff88085b1ebb68
[ 0.457846] 000000000000007b ffff88085b12d140 ffff88085b249000
000000000000007b
[ 0.465298] ffff88085b1ebb28 ffffffff81af2900 0000000000000000
002052d05b12d140
[ 0.472750] Call Trace:
[ 0.475206] [<ffffffff811d27b3>] ? deactivate_slab+0x383/0x400
[ 0.481123] [<ffffffff811d3947>] new_slab+0xa7/0x460
[ 0.486174] [<ffffffff816789e5>] __slab_alloc+0x310/0x470
[ 0.491655] [<ffffffff8105304f>] ? dmar_msi_set_affinity+0x8f/0xc0
[ 0.497921] [<ffffffff810d4bc1>] ? __irq_domain_add+0x41/0x100
[ 0.503838] [<ffffffff810d0fee>] ? irq_do_set_affinity+0x5e/0x70
[ 0.509920] [<ffffffff811d571d>] __kmalloc_node+0xad/0x2e0
[ 0.515483] [<ffffffff810d4bc1>] ? __irq_domain_add+0x41/0x100
[ 0.521392] [<ffffffff810d4bc1>] __irq_domain_add+0x41/0x100
[ 0.527133] [<ffffffff8105102e>] mp_irqdomain_create+0x9e/0x120
[ 0.533140] [<ffffffff81b2fb14>] setup_IO_APIC+0x64/0x1be
[ 0.538622] [<ffffffff81b2e226>] apic_bsp_setup+0xa2/0xae
[ 0.544099] [<ffffffff81b2bc70>] native_smp_prepare_cpus+0x267/0x2b2
[ 0.550531] [<ffffffff81b1927b>] kernel_init_freeable+0xf2/0x253
[ 0.556625] [<ffffffff8166b960>] ? rest_init+0x80/0x80
[ 0.561845] [<ffffffff8166b96e>] kernel_init+0xe/0xf0
[ 0.566979] [<ffffffff81681bd8>] ret_from_fork+0x58/0x90
[ 0.572374] [<ffffffff8166b960>] ? rest_init+0x80/0x80
[ 0.577591] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3
f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b
45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01 00 00 00 44 89 f1 d3 e0
[ 0.597537] RIP [<ffffffff81182587>] __alloc_pages_nodemask+0xb7/0x940
[ 0.604158] RSP <ffff88085b1ebac8>
[ 0.607643] CR2: 0000000000001b08
[ 0.610962] ---[ end trace 0a600c0841386992 ]---
[ 0.615573] Kernel panic - not syncing: Fatal exception
[ 0.620792] ---[ end Kernel panic - not syncing: Fatal exception
*From:* Rob Herring <mailto:robherring2@xxxxxxxxx>
*Date:* 2015-04-14 00:49
*To:* Konstantin Khlebnikov <mailto:khlebnikov@xxxxxxxxxxxxxx>
*CC:* Grant Likely <mailto:grant.likely@xxxxxxxxxx>;
devicetree@xxxxxxxxxxxxxxx <mailto:devicetree@xxxxxxxxxxxxxxx>; Rob
Herring <mailto:robh+dt@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
<mailto:linux-kernel@xxxxxxxxxxxxxxx>; sparclinux@xxxxxxxxxxxxxxx
<mailto:sparclinux@xxxxxxxxxxxxxxx>; linux-mm@xxxxxxxxx
<mailto:linux-mm@xxxxxxxxx>; linuxppc-dev
<mailto:linuxppc-dev@xxxxxxxxxxxxxxxx>
*Subject:* Re: [PATCH] of: return NUMA_NO_NODE from fallback
of_node_to_nid()
On Mon, Apr 13, 2015 at 8:38 AM, Konstantin Khlebnikov
<khlebnikov@xxxxxxxxxxxxxx> wrote:
> On 13.04.2015 16:22, Rob Herring wrote:
>>
>> On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
>> <khlebnikov@xxxxxxxxxxxxxx> wrote:
>>>
>>> Node 0 might be offline as well as any other numa node,
>>> in this case kernel cannot handle memory allocation and crashes.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
>>> Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
>>> ---
>>> drivers/of/base.c | 2 +-
>>> include/linux/of.h | 5 ++++-
>>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index 8f165b112e03..51f4bd16e613 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
>>> #ifdef CONFIG_NUMA
>>> int __weak of_node_to_nid(struct device_node *np)
>>> {
>>> - return numa_node_id();
>>> + return NUMA_NO_NODE;
>>
>>
>> This is going to break any NUMA machine that enables OF and expects
>> the weak function to work.
>
>
> Why? NUMA_NO_NODE == -1 -- this's standard "no-affinity" signal.
> As I see powerpc/sparc versions of of_node_to_nid returns -1 if they
> cannot find out which node should be used.
Ah, I was thinking those platforms were relying on the default
implementation. I guess any real NUMA support is going to need to
override this function. The arm64 patch series does that as well. We
need to be sure this change is correct for metag which appears to be
the only other OF enabled platform with NUMA support.
In that case, then there is little reason to keep the inline and we
can just always enable the weak function (with your change). It is
slightly less optimal, but the few callers hardly appear to be hot
paths.
Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/