Re: [PATCH v1 3/4] memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED

From: David Hildenbrand
Date: Fri Oct 01 2021 - 04:04:39 EST


On 30.09.21 23:21, Mike Rapoport wrote:
On Wed, Sep 29, 2021 at 06:54:01PM +0200, David Hildenbrand wrote:
On 29.09.21 18:39, Mike Rapoport wrote:
Hi,

On Mon, Sep 27, 2021 at 05:05:17PM +0200, David Hildenbrand wrote:
Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED.
Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such memory
like ordinary MEMBLOCK_NONE memory -- for example, when selecting memory
regions to add to the vmcore for dumping in the crashkernel via
for_each_mem_range().
Can you please elaborate on the difference in semantics of MEMBLOCK_HOTPLUG
and MEMBLOCK_DRIVER_MANAGED?
Unless I'm missing something they both mark memory that can be unplugged
anytime and so it should not be used in certain cases. Why is there a need
for a new flag?

In the cover letter I have "Alternative B: Reuse MEMBLOCK_HOTPLUG.
MEMBLOCK_HOTPLUG serves a different purpose, though.", but looking into the
details it won't work as is.

MEMBLOCK_HOTPLUG is used to mark memory early during boot that can later get
hotunplugged again and should be placed into ZONE_MOVABLE if the
"movable_node" kernel parameter is set.

The confusing part is that we talk about "hotpluggable" but really mean
"hotunpluggable": the reason is that HW flags DIMM slots that can later be
hotplugged as "hotpluggable" even though there is already something
hotplugged.

MEMBLOCK_HOTPLUG name is indeed somewhat confusing, but still it's core
meaning "this memory may be removed" which does not differ from what
IORESOURCE_SYSRAM_DRIVER_MANAGED means.

MEMBLOCK_HOTPLUG regions are indeed placed into ZONE_MOVABLE, but more
importantly, they are avoided when we allocate memory from memblock.

So, in my view, both flags mean that the memory may be removed and it
should not be used for certain types of allocations.

The semantics are different:

MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the firmware-provided memory map and added to the system early during boot; we want this memory to be managed by ZONE_MOVABLE with "movable_node" set on the kernel command line, because only then we want it to be hotpluggable again. kexec *has to* indicate this memory to the second kernel and can place kexec-images on this memory. After memory hotunplug, kexec has to be re-armed.

MEMBLOCK_DRIVER_MANAGED: memory is not indicated as System RAM" in the firmware-provided memory map; this memory is always detected and added to the system by a driver; memory might not actually be physically hotunpluggable and the ZONE selection does not depend on "movable_core". kexec *must not* indicate this memory to the second kernel and *must not* place kexec-images on this memory.


I would really advise against mixing concepts here.


What we could do is indicate *all* hotplugged memory (not just IORESOURCE_SYSRAM_DRIVER_MANAGED memory) as MEMBLOCK_HOTPLUG and make MEMBLOCK_HOTPLUG less dependent on "movable_node".

MEMBLOCK_HOTPLUG for early boot memory: with "movable_core", place it in ZONE_MOVABLE. Even without "movable_core", don't place early kernel allocations on this memory.
MEMBLOCK_HOTPLUG for all memory: don't place kexec images or on this memory, independent of "movable_core".


memblock would then not contain the information "contained in firmware-provided memory map" vs. "not contained in firmware-provided memory map"; but I think right now it's not strictly required to have that information if we'd go down that path.

For example, ranges in the ACPI SRAT that are marked as
ACPI_SRAT_MEM_HOT_PLUGGABLE will be marked MEMBLOCK_HOTPLUG early during
boot (drivers/acpi/numa/srat.c:acpi_numa_memory_affinity_init()). Later, we
use that information to size ZONE_MOVABLE
(mm/page_alloc.c:find_zone_movable_pfns_for_nodes()). This will make sure
that these "hotpluggable" DIMMs can later get hotunplugged.

Also, see should_skip_region() how this relates to the "movable_node" kernel
parameter:

/* skip hotpluggable memory regions if needed */
if (movable_node_is_enabled() && memblock_is_hotpluggable(m) &&
(flags & MEMBLOCK_HOTPLUG))
return true;

Hmm, I think that the movable_node_is_enabled() check here is excessive,
but I suspect we cannot simply remove it without breaking anything.

The reasoning is: without "movable_core" we don't want this memory to be hotunpluggable; consequently, we don't care if we place kexec-images on this memory. MEMBLOCK_HOTPLUG is currently only active with "movable_core".

If we remove that check, we will always not place early kernel allocations on that memory, even if we don't care about ZONE_MOVABLE.


I'll take a deeper look on the potential consequences.

BTW, is there anything that prevents putting kexec to hot-unplugable memory
that was cold-plugged on boot?

I think it depends on how the platform handles hotunpluggable DIMMs or hotunpluggable NUMA nodes. If the platform ends up indicates such memory via MEMBLOCK_HOTPLUG, and "movable_core" is set, memory would be put into ZONE_MOVABLE and kexec would not place kexec-images on that memory.

--
Thanks,

David / dhildenb