Re: [PATCH 0/8] implement "memmap on memory" feature on s390

From: Sumanth Korikkar
Date: Tue Nov 21 2023 - 08:13:42 EST


On Fri, Nov 17, 2023 at 04:37:29PM +0100, David Hildenbrand wrote:
> >
> > Maybe there is also already a common code bug with that, s390 might be
> > special but that is often also good for finding bugs in common code ...
>
> If it's only the page_init_poison() as noted by Sumanth, we could disable
> that on s390x with an altmap some way or the other; should be possible.
>
> I mean, you effectively have your own poisoning if the altmap is effectively
> inaccessible and makes your CPU angry on access :)
>
> Last but not least, support for an inaccessible altmap might come in handy
> for virtio-mem eventually, and make altmap support eventually simpler. So
> added bonus points.

We tried out two possibilities dealing with vmemmap altmap inaccessibilty.
Approach 1: Add MHP_ALTMAP_INACCESSIBLE flag and pass it in add_memory()

diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 075094ca59b4..ab2dfcc7e9e4 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -358,6 +358,13 @@ static int sclp_mem_notifier(struct notifier_block *nb,
* buddy allocator later.
*/
__arch_set_page_nodat((void *)__va(start), memory_block->altmap->free);
+ /*
+ * Poison the struct pages after memory block is accessible.
+ * This is needed for only altmap. Without altmap, the struct
+ * pages are poisoined in sparse_add_section().
+ */
+ if (memory_block->altmap->inaccessible)
+ page_init_poison(pfn_to_page(arg->start_pfn), memory_block->altmap->free);
break;
case MEM_FINISH_OFFLINE:
sclp_mem_change_state(start, size, 0);
@@ -412,7 +419,7 @@ static void __init add_memory_merged(u16 rn)
goto skip_add;
for (addr = start; addr < start + size; addr += block_size)
add_memory(0, addr, block_size,
- MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY : MHP_NONE);
+ MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY|MHP_ALTMAP_INACCESSIBLE : MHP_NONE);
skip_add:
first_rn = rn;
num = 1;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 7d2076583494..5c70707e706f 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -106,6 +106,11 @@ typedef int __bitwise mhp_t;
* implies the node id (nid).
*/
#define MHP_NID_IS_MGID ((__force mhp_t)BIT(2))
+/*
+ * Mark memmap on memory (struct pages array) as inaccessible during memory
+ * hotplug addition phase.
+ */
+#define MHP_ALTMAP_INACCESSIBLE ((__force mhp_t)BIT(3))

/*
* Extended parameters for memory hotplug:
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 744c830f4b13..9837f3e6fb95 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -25,6 +25,7 @@ struct vmem_altmap {
unsigned long free;
unsigned long align;
unsigned long alloc;
+ bool inaccessible;
};

/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7a5fc89a8652..d8299853cdcc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1439,6 +1439,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
if (mhp_supports_memmap_on_memory(size)) {
mhp_altmap.free = memory_block_memmap_on_memory_pages();
+ if (mhp_flags & MHP_ALTMAP_INACCESSIBLE)
+ mhp_altmap.inaccessible = true;
params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
if (!params.altmap) {
ret = -ENOMEM;
diff --git a/mm/sparse.c b/mm/sparse.c
index 77d91e565045..3991c717b769 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -907,7 +907,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
* Poison uninitialized struct pages in order to catch invalid flags
* combinations.
*/
- page_init_poison(memmap, sizeof(struct page) * nr_pages);
+ if (!altmap || !altmap->inaccessible)
+ page_init_poison(memmap, sizeof(struct page) * nr_pages);

ms = __nr_to_section(section_nr);
set_section_nid(section_nr, nid);


Approach 2:
===========
Shouldnt kasan zero shadow mapping performed first before
accessing/initializing memmap via page_init_poisining()? If that is
true, then it is a problem for all architectures and should could be
fixed like:


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7a5fc89a8652..eb3975740537 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1093,6 +1093,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
if (ret)
return ret;

+ page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages);
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);

for (i = 0; i < nr_pages; i++)
diff --git a/mm/sparse.c b/mm/sparse.c
index 77d91e565045..4ddf53f52075 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -906,8 +906,11 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
/*
* Poison uninitialized struct pages in order to catch invalid flags
* combinations.
+ * For altmap, do this later when onlining the memory, as it might
+ * not be accessible at this point.
*/
- page_init_poison(memmap, sizeof(struct page) * nr_pages);
+ if (!altmap)
+ page_init_poison(memmap, sizeof(struct page) * nr_pages);

ms = __nr_to_section(section_nr);
set_section_nid(section_nr, nid);



Also, if this approach is taken, should page_init_poison() be performed
with cond_resched() as mentioned in commit d33695b16a9f
("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") ?

Opinions?

Thank you