Re: [PATCH 0/8] implement "memmap on memory" feature on s390

From: David Hildenbrand
Date: Tue Nov 21 2023 - 14:25:08 EST


On 21.11.23 14:13, Sumanth Korikkar wrote:
On Fri, Nov 17, 2023 at 04:37:29PM +0100, David Hildenbrand wrote:

Maybe there is also already a common code bug with that, s390 might be
special but that is often also good for finding bugs in common code ...

If it's only the page_init_poison() as noted by Sumanth, we could disable
that on s390x with an altmap some way or the other; should be possible.

I mean, you effectively have your own poisoning if the altmap is effectively
inaccessible and makes your CPU angry on access :)

Last but not least, support for an inaccessible altmap might come in handy
for virtio-mem eventually, and make altmap support eventually simpler. So
added bonus points.

We tried out two possibilities dealing with vmemmap altmap inaccessibilty.
Approach 1: Add MHP_ALTMAP_INACCESSIBLE flag and pass it in add_memory()

diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 075094ca59b4..ab2dfcc7e9e4 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -358,6 +358,13 @@ static int sclp_mem_notifier(struct notifier_block *nb,
* buddy allocator later.
*/
__arch_set_page_nodat((void *)__va(start), memory_block->altmap->free);
+ /*
+ * Poison the struct pages after memory block is accessible.
+ * This is needed for only altmap. Without altmap, the struct
+ * pages are poisoined in sparse_add_section().
+ */
+ if (memory_block->altmap->inaccessible)
+ page_init_poison(pfn_to_page(arg->start_pfn), memory_block->altmap->free);

See below, likely it's best if the core will do that.

break;
case MEM_FINISH_OFFLINE:
sclp_mem_change_state(start, size, 0);
@@ -412,7 +419,7 @@ static void __init add_memory_merged(u16 rn)
goto skip_add;
for (addr = start; addr < start + size; addr += block_size)
add_memory(0, addr, block_size,
- MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY : MHP_NONE);
+ MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY|MHP_ALTMAP_INACCESSIBLE : MHP_NONE);
skip_add:
first_rn = rn;
num = 1;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 7d2076583494..5c70707e706f 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -106,6 +106,11 @@ typedef int __bitwise mhp_t;
* implies the node id (nid).
*/
#define MHP_NID_IS_MGID ((__force mhp_t)BIT(2))
+/*
+ * Mark memmap on memory (struct pages array) as inaccessible during memory
+ * hotplug addition phase.
+ */
+#define MHP_ALTMAP_INACCESSIBLE ((__force mhp_t)BIT(3))

If we go that path, maybe rather MHP_OFFLINE_INACCESSIBLE and document how this relates to/interacts with the memory notifier callbacks and the altmap.

Then, we can logically abstract this from altmap handling. Simply, the memory should not be read/written before the memory notifier was called.

In the core, you can do the poisioning for the altmap in that case after calling the notifier, probably in mhp_init_memmap_on_memory() as you do below.

/*
* Extended parameters for memory hotplug:
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 744c830f4b13..9837f3e6fb95 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -25,6 +25,7 @@ struct vmem_altmap {
unsigned long free;
unsigned long align;
unsigned long alloc;
+ bool inaccessible;

We should then likely remember that information for the memory block, not the altmap.

[...]



Approach 2:
===========
Shouldnt kasan zero shadow mapping performed first before
accessing/initializing memmap via page_init_poisining()? If that is

Likely the existing way. The first access to the poisoned memmap should be a write, not a read. But I didn't look into the details.

Can you try reproducing?

true, then it is a problem for all architectures and should could be
fixed like:


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7a5fc89a8652..eb3975740537 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1093,6 +1093,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
if (ret)
return ret;

+ page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages);
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);

for (i = 0; i < nr_pages; i++)
diff --git a/mm/sparse.c b/mm/sparse.c
index 77d91e565045..4ddf53f52075 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -906,8 +906,11 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
/*
* Poison uninitialized struct pages in order to catch invalid flags
* combinations.
+ * For altmap, do this later when onlining the memory, as it might
+ * not be accessible at this point.
*/
- page_init_poison(memmap, sizeof(struct page) * nr_pages);
+ if (!altmap)
+ page_init_poison(memmap, sizeof(struct page) * nr_pages);

ms = __nr_to_section(section_nr);
set_section_nid(section_nr, nid);


That's too generic when it comes to other altmap users, especially DAX or when the altmap is accessible while memory is offlining, and we want to catch that.



Also, if this approach is taken, should page_init_poison() be performed
with cond_resched() as mentioned in commit d33695b16a9f
("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") ?

Likely as soon as possible after it is accessible :)


--
Cheers,

David / dhildenb