Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove

From: Robin Murphy
Date: Mon Nov 27 2017 - 10:20:44 EST


On 24/11/17 15:54, Andrea Reale wrote:
On Fri 24 Nov 2017, 16:43, Michal Hocko wrote:
On Fri 24-11-17 14:49:17, Andrea Reale wrote:
Hi Rafael,

On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote:
On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale <ar@xxxxxxxxxxxxxxxxxx> wrote:
Resending the patch adding linux-acpi in CC, as suggested by Rafael.
Everyone else: apologies for the noise.

Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal")
introduced an assumption whereas when control
reaches remove_memory the corresponding memory has been already
offlined. In that case, the acpi_memhotplug was making sure that
the assumption held.
This assumption, however, is not necessarily true if offlining
and removal are not done by the same "controller" (for example,
when first offlining via sysfs).

Removing this assumption for the generic remove_memory code
and moving it in the specific acpi_memhotplug code. This is
a dependency for the software-aided arm64 offlining and removal
process.

Signed-off-by: Andrea Reale <ar@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Maciej Bielski <m.bielski@xxxxxxxxxxxxxxxxxx>
---
drivers/acpi/acpi_memhotplug.c | 2 +-
include/linux/memory_hotplug.h | 9 ++++++---
mm/memory_hotplug.c | 13 +++++++++----
3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef..b0126a0 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
nid = memory_add_physaddr_to_nid(info->start_addr);

acpi_unbind_memory_blocks(info);
- remove_memory(nid, info->start_addr, info->length);
+ BUG_ON(remove_memory(nid, info->start_addr, info->length));

Why does this have to be BUG_ON()? Is it really necessary to kill the
system here?

Actually, I hoped you would help me understand that: that BUG() call was introduced
by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal")
in memory_hoptlug.c:remove_memory()).

Just reading at that commit my understanding was that you were assuming
that acpi_memory_remove_memory() have already done the job of offlining
the target memory, so there would be a bug if that wasn't the case.

In my case, that assumption did not hold and I found that it might not
hold for other platforms that do not use ACPI. In fact, the purpose of
this patch is to move this assumption out of the generic hotplug code
and move it to ACPI code where it originated.

remove_memory failure is basically impossible to handle AFAIR. The
original code to BUG in remove_memory is ugly as hell and we do not want
to spread that out of that function. Instead we really want to get rid
of it.

Today, BUG() is called even in the simple case where remove fails
because the section we are removing is not offline. I cannot see any need to
BUG() in such a case: an error code seems more than sufficient to me.
This is why this patch removes the BUG() call when the "offline" check
fails from the generic code.
It moves it back to the ACPI call, where the assumption
originated. Honestlly, I cannot tell if it makes sense to BUG() there:
I have nothing against removing it from ACPI hotplug too, but
I don't know enough to feel free to change the acpi semantics myself, so I
moved it there to keep the original behavior unchanged for x86 code.

In this arm64 hot-remove port, offline and remove are done in two separate
steps, and is conceivable that an user tries erroneusly to remove some
section that he forgot to offline first: in that case, with the patch,
remove will just report an erro without BUGing.

The user can already kill the system by misusing the sysfs probe driver; should similar theoretical misuse of your sysfs remove driver really need to be all that different?

Is my reasoning flawed?

Furthermore, even if your driver does want to enforce this, I don't see why it can't just do the equivalent of memory_subsys_offline() itself before even trying to call remove_memory().

Robin.


Cheers,
Andrea

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel