Re: [PATCH] mm/memory_hotplug.c: don't fail hot unplug quite so eagerly

From: John Hubbard
Date: Wed Jun 21 2023 - 22:22:44 EST


On 6/21/23 01:11, David Hildenbrand wrote:
...what about discerning between "user initiated offline_pages" and
"offline pages as part of a driver shutdown/unload"?

Makes sense to me.

There are two ways for triggering it directly from user space:

1) drivers/base/core.c:online_store()
2) drivers/base/memory.c:state_store()

We cannot easily hook into 2) to indicate "we're offlining directly
from user space". SO we might have to do it the other way around.


Something along the following lines should do the trick (expect whitespace damage):


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 53ee7654f009..acd4b739505a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -152,6 +152,13 @@ void put_online_mems(void)

 bool movable_node_enabled = false;

+/*
+ * Protected by the device hotplug lock. Indicates whether device offlining
+ * is triggered from try_offline_memory_block() such that we don't fail memory
+ * offlining if a signal is pending.
+ */
+static bool mhp_in_try_offline_memory_block;
+
 #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
 int mhp_default_online_type = MMOP_OFFLINE;
 #else
@@ -1860,7 +1867,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
        do {
                pfn = start_pfn;
                do {
-                       if (signal_pending(current)) {
+                       if (!mhp_in_try_offline_memory_block &&
+                           signal_pending(current)) {
                                ret = -EINTR;
                                reason = "signal backoff";
                                goto failed_removal_isolated;
@@ -2177,7 +2185,9 @@ static int try_offline_memory_block(struct memory_block *mem, void *arg)
        if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
                online_type = MMOP_ONLINE_MOVABLE;

+       mhp_in_try_offline_memory_block = true;
        rc = device_offline(&mem->dev);
+       mhp_in_try_offline_memory_block = false;
        /*
         * Default is MMOP_OFFLINE - change it only if offlining succeeded,
         * so try_reonline_memory_block() can do the right thing.



There is still arch/powerpc/platforms/pseries/hotplug-memory.c that calls
device_offline() and would fail on signals (not sure if relevant, like for virtio-mem it
shouldn't be that relevant).

I guess dlpar_remove_lmb() can now simply call offline_and_remove_memory().
[I might craft a patch later]


This direction looks good to me, I'd love to see a patch if you
put something together.


thanks,
--
John Hubbard
NVIDIA