Re: [tip:x86/mm] x86/mm/numa: Fix 32-bit kernel NUMA boot

From: Yinghai Lu
Date: Fri Dec 20 2013 - 01:23:26 EST


On Thu, Dec 19, 2013 at 6:17 PM, Lans Zhang <jia.zhang@xxxxxxxxxxxxx> wrote:
> On 12/20/2013 12:44 AM, Yinghai Lu wrote:
>>
>> On Thu, Dec 19, 2013 at 7:42 AM, tip-bot for Lans Zhang
>> <tipbot@xxxxxxxxx> wrote:
>>>
>>> Commit-ID: f3d815cb854b2f6262ade56a4d91a1ed3f1e50c4
>>> Gitweb:
>>> http://git.kernel.org/tip/f3d815cb854b2f6262ade56a4d91a1ed3f1e50c4
>>> Author: Lans Zhang<jia.zhang@xxxxxxxxxxxxx>
>>> AuthorDate: Fri, 6 Dec 2013 12:18:30 +0800
>>> Committer: Ingo Molnar<mingo@xxxxxxxxxx>
>>> CommitDate: Thu, 19 Dec 2013 13:58:36 +0100
>>>
>>> x86/mm/numa: Fix 32-bit kernel NUMA boot
>>>
>>> When booting a 32-bit x86 kernel on a NUMA machine, node data
>>> cannot be allocated from local node if the account of memory for
>>> node 0 covers the low memory space entirely:
>>>
>>> [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
>>> [ 0.000000] NODE_DATA [mem 0x367ed000-0x367edfff]
>>> [ 0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
>>> [ 0.000000] Cannot find 4096 bytes in node 1
>>> [ 0.000000] 64664MB HIGHMEM available.
>>> [ 0.000000] 871MB LOWMEM available.
>>>
>>> To fix this issue, node data is allowed to be allocated from
>>> other nodes if the memory of local node is still not mapped. The
>>> expected result looks like this:
>>>
>>> [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
>>> [ 0.000000] NODE_DATA [mem 0x367ed000-0x367edfff]
>>> [ 0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
>>> [ 0.000000] NODE_DATA [mem 0x367ec000-0x367ecfff]
>>> [ 0.000000] NODE_DATA(1) on node 0
>>> [ 0.000000] 64664MB HIGHMEM available.
>>> [ 0.000000] 871MB LOWMEM available.
>>>
>>> Signed-off-by: Lans Zhang<jia.zhang@xxxxxxxxxxxxx>
>>> Cc:<andi@xxxxxxxxxxxxxx>
>>> Cc: Yinghai Lu<yinghai@xxxxxxxxxx>
>>> Link:
>>> http://lkml.kernel.org/r/1386303510-18574-1-git-send-email-jia.zhang@xxxxxxxxxxxxx
>>> Signed-off-by: Ingo Molnar<mingo@xxxxxxxxxx>
>>> ---
>>> arch/x86/mm/numa.c | 10 +++++++---
>>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>>> index 24aec58..c85da7b 100644
>>> --- a/arch/x86/mm/numa.c
>>> +++ b/arch/x86/mm/numa.c
>>> @@ -211,9 +211,13 @@ static void __init setup_node_data(int nid, u64
>>> start, u64 end)
>>> */
>>> nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> if (!nd_pa) {
>>> - pr_err("Cannot find %zu bytes in node %d\n",
>>> - nd_size, nid);
>>> - return;
>>> + nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
>>> + MEMBLOCK_ALLOC_ACCESSIBLE);
>>> + if (!nd_pa) {
>>> + pr_err("Cannot find %zu bytes in node %d\n",
>>> + nd_size, nid);
>>> + return;
>>> + }
>>> }
>>> nd = __va(nd_pa);
>>>
>>
>> Can you just use memblock_alloc_try_nid instead memblock_alloc_nid?
>
>
> But memblock_alloc_base() inside memblock_alloc_try_nid() may cause kernel
> panic
> if __memblock_alloc_base() inside it fails. In current stage, it is allowed
> if
> node data fails to be allocated.
>

that take MEMBLOCK_ALLOC_ACCESSIBLE, and it should not happen.

BTW it happens wrongly, should panic. as it can not alloc any.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/