Re: [PATCH v4 1/5] mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers

From: David Hildenbrand
Date: Tue Nov 28 2023 - 06:09:11 EST


On 28.11.23 12:03, Sumanth Korikkar wrote:
Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to
prepare the transition of memory to and from a physically accessible
state. This enhancement is crucial for implementing the "memmap on
memory" feature for s390 in a subsequent patch.

Platforms such as x86 can support physical memory hotplug via ACPI. When
there is physical memory hotplug, ACPI event leads to the memory
addition with the following callchain:
acpi_memory_device_add()
-> acpi_memory_enable_device()
-> __add_memory()

After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.

On s390, memory hotplug works in a different way. The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap
on memory" initialization is performed before calling MEM_GOING_ONLINE
notifier.

During the memory hotplug addition phase, altmap support is prepared and
during the memory onlining phase s390 requires memory to be physically
accessible and then subsequently initiate the "memmap on memory"
initialization process.

The memory provider will handle new MEM_PREPARE_ONLINE /
MEM_FINISH_OFFLINE notifications and make the memory accessible.

The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when
used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be
written (e.g., poisoned) when adding memory -- before it is set online.
This allows for adding memory with an altmap that is not currently made
available by a hypervisor. When onlining that memory, the hypervisor can
be instructed to make that memory accessible via the new notifiers and
the onlining phase will not require any memory allocations, which is
helpful in low-memory situations.

All architectures ignore unknown memory notifiers. Therefore, the
introduction of these new notifiers does not result in any functional
modifications across architectures.

Suggested-by: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx>
Suggested-by: David Hildenbrand <david@xxxxxxxxxx>
Signed-off-by: Sumanth Korikkar <sumanthk@xxxxxxxxxxxxx>
---
drivers/base/memory.c | 23 ++++++++++++++++++++++-
include/linux/memory.h | 9 +++++++++
include/linux/memory_hotplug.h | 18 +++++++++++++++++-
include/linux/memremap.h | 1 +
mm/memory_hotplug.c | 13 ++++++++++++-
mm/sparse.c | 3 ++-
6 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8a13babd826c..b99bcc70d6e5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -188,6 +188,7 @@ static int memory_block_online(struct memory_block *mem)
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = 0;
+ struct memory_notify arg;
struct zone *zone;
int ret;
@@ -207,9 +208,19 @@ static int memory_block_online(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;
+ arg.altmap_start_pfn = start_pfn;
+ arg.altmap_nr_pages = nr_vmemmap_pages;
+ arg.start_pfn = start_pfn + nr_vmemmap_pages;
+ arg.nr_pages = nr_pages - nr_vmemmap_pages;
mem_hotplug_begin();
+ ret = memory_notify(MEM_PREPARE_ONLINE, &arg);
+ ret = notifier_to_errno(ret);
+ if (ret)
+ goto out_notifier;
+
if (nr_vmemmap_pages) {
- ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
+ ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages,
+ zone, mem->altmap->inaccessible);
if (ret)
goto out;
}
@@ -231,7 +242,11 @@ static int memory_block_online(struct memory_block *mem)
nr_vmemmap_pages);
mem->zone = zone;
+ mem_hotplug_done();
+ return ret;
out:
+ memory_notify(MEM_FINISH_OFFLINE, &arg);
+out_notifier:
mem_hotplug_done();
return ret;
}
@@ -244,6 +259,7 @@ static int memory_block_offline(struct memory_block *mem)
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = 0;
+ struct memory_notify arg;
int ret;
if (!mem->zone)
@@ -275,6 +291,11 @@ static int memory_block_offline(struct memory_block *mem)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
mem->zone = NULL;
+ arg.altmap_start_pfn = start_pfn;
+ arg.altmap_nr_pages = nr_vmemmap_pages;
+ arg.start_pfn = start_pfn + nr_vmemmap_pages;
+ arg.nr_pages = nr_pages - nr_vmemmap_pages;
+ memory_notify(MEM_FINISH_OFFLINE, &arg);
out:
mem_hotplug_done();
return ret;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index f53cfdaaaa41..939a16bd5cea 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -96,8 +96,17 @@ int set_memory_block_size_order(unsigned int order);
#define MEM_GOING_ONLINE (1<<3)
#define MEM_CANCEL_ONLINE (1<<4)
#define MEM_CANCEL_OFFLINE (1<<5)
+#define MEM_PREPARE_ONLINE (1<<6)
+#define MEM_FINISH_OFFLINE (1<<7)
struct memory_notify {
+ /*
+ * The altmap_start_pfn and altmap_nr_pages fields are designated for
+ * specifying the altmap range and are exclusively intended for use in
+ * MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers.
+ */
+ unsigned long altmap_start_pfn;
+ unsigned long altmap_nr_pages;
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;

Maybe we should do something like:

union {
/* For MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE ... */
struct {
unsigned long altmap_start_pfn;
unnsigned long altmap_nr_pages;
};
/* For all others ... */
struct {
int status_change_nid_normal;
...
};
}

Acked-by: David Hildenbrand <david@xxxxxxxxxx>

--
Cheers,

David / dhildenb