RE: [2/8,v3] NUMA Hotplug Emulator: infrastructure of NUMA hotplugemulation

From: David Rientjes
Date: Sun Nov 21 2010 - 16:43:10 EST


On Sun, 21 Nov 2010, Li, Haicheng wrote:

> > I think what we'll end up wanting to do is something like this, which
> > adds
> > a numa=possible=<N> parameter for x86; this will add an additional N
> > possible nodes to node_possible_map that we can use to online later.
> > It
> > also adds a new /sys/devices/system/memory/add_node file which takes a
> > typical "size@start" value to hot-add an emulated node. For example,
> > using "mem=2G numa=possible=1" on the command line and doing
> > echo 128M@0x80000000" > /sys/devices/system/memory/add_node would
> > hot-add
> > a node of 128M.
> >
> > Comments?
>
> Sorry for the late response as I'm in a biz trip recently.
>
> David, your original concern is just about powerful/flexibility. I'm
> sure our implementation can better meets such requirments.
>

Not with hacky hidden nodes or being unnecessarily tied to e820, it can't.

> IMHO, I don't see any powerful/flexibility from your patch, compared to
> our original implementation. you just make things more complex and mess.
> Why not use "numa=hide=N*size" as originally implemented?

Hidden nodes are a hack and completely unnecessary for node hotplug
emulation, there's no need to have additional nodemasks or node states
throughout the kernel. They also require that you define the node sizes
at boot, mine allows you to hotplug multiple node sizes of your choice at
runtime.

> - later you just need to online the node once you want. And it
> naturally/exactly emulates the behavior that current HW provides.

My proposal allows you to hotplug various node sizes, they can be
offlined, their sizes can be subsequently changed, and re-hotplugged.
It's a very dynamic and flexible model that allows you to emulate all
possible combinations of node hotplug without constantly rebooting.

> - N is the possible node number. And we can use 128M as the default
> size for each hidden node if user doesn't specify a size.

My model allows you to define the node size you'd like to add at runtime.

> - If user wants more mem for hidden node, he just needs specify the
> "size".
> - besides, user can also use "mem=" to hide more mem and later use
> mem-add i/f to freely attach more mem to the hidden node during runtime.
>

Each of these requires a reboot, you cannot emulate hotplugging a node,
offlining it, removing the memory, and re-hotplugging the same node with a
larger amount of added memory with your model.

> Your patch introduces additional dependency on "mem=", but ours is
> simple and flexibly compatible with "mem=" and "numa=emu".
>

This is the natural use case of mem=, to truncate the memory map to only
allow the kernel to have a portion of usable memory. The remainder can be
used by this new interface, if desired, with complete power and control
over the size of nodes you're adding without having to conform to hidden
node sizes that you've specified at boot.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/