Re: CONFIG_HOLES_IN_ZONE and memory hot plug code on x86_64

From: Steffen Persvold
Date: Fri Aug 28 2015 - 03:16:09 EST







On 27/08/15 22:20 , "yhlu.kernel@xxxxxxxxx on behalf of Yinghai Lu" <yhlu.kernel@xxxxxxxxx on behalf of yinghai@xxxxxxxxxx> wrote:

>On Fri, Jun 26, 2015 at 4:31 PM, Steffen Persvold <sp@xxxxxxxxxxxxx> wrote:
>> Weâve encountered an issue in a special case where we have a sparse E820 map [1].
>>
>> Basically the memory hotplug code is causing a âkernel paging requestâ BUG [2].
>
>the trace does not look like hotplug path.
>
>>
>> By instrumenting the function register_mem_sect_under_node() in drivers/base/node.c we see that it is called two times with the same struct memory_block argument :
>>
>> [ 1.901463] register_mem_sect_under_node: start = 80, end = 8f, nid = 0
>> [ 1.908129] register_mem_sect_under_node: start = 80, end = 8f, nid = 1
>
>Can you post whole log with SRAT related info?

I can probably reproduce again and get full logs when I get run time on the system again, but hereâs some output that we saved in our internal Jira case :

[ 0.000000] NUMA: Initialized distance table, cnt=6
[ 0.000000] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xd7ffffff] -> [mem 0x00000000-0xd7ffffff]
[ 0.000000] NUMA: Node 0 [mem 0x00000000-0xd7ffffff] + [mem 0x100000000-0x427ffffff] -> [mem 0x00000000-0x427ffffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x407fe3000-0x407ffffff]
[ 0.000000] NODE_DATA(1) allocated [mem 0x807fe3000-0x807ffffff]
[ 0.000000] NODE_DATA(2) allocated [mem 0xc07fe3000-0xc07ffffff]
[ 0.000000] NODE_DATA(3) allocated [mem 0x1007fe3000-0x1007ffffff]
[ 0.000000] NODE_DATA(4) allocated [mem 0x1407fe3000-0x1407ffffff]
[ 0.000000] NODE_DATA(5) allocated [mem 0x1807fdd000-0x1807ff9fff]
[ 0.000000] [ffffea0000000000-ffffea00101fffff] PMD -> [ffff8803f8600000-ffff880407dfffff] on node 0
[ 0.000000] [ffffea0010a00000-ffffea00201fffff] PMD -> [ffff8807f8600000-ffff880807dfffff] on node 1
[ 0.000000] [ffffea0020a00000-ffffea00301fffff] PMD -> [ffff880bf8600000-ffff880c07dfffff] on node 2
[ 0.000000] [ffffea0030a00000-ffffea00401fffff] PMD -> [ffff880ff8600000-ffff881007dfffff] on node 3
[ 0.000000] [ffffea0040a00000-ffffea00501fffff] PMD -> [ffff8813f8600000-ffff881407dfffff] on node 4
[ 0.000000] [ffffea0050a00000-ffffea00601fffff] PMD -> [ffff8817f7e00000-ffff8818075fffff] on node 5

If I remember correctly there was a mix of 4GB and 8GB DIMMs populated on this system. In addition the firmware reserved 512MByte at the end of each memory controllers physical range (hence the reserved ranges in the e820 map).

Note: this was with 4.1.0 vanilla so it could be obsolete now with 4.2-rc. I have not yet tested with your latest patches that you and Tony discussed.


Cheers,
Steffen


N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå