Re: [PATCH V3 2/3] arm64/mm/hotplug: Enable MEM_OFFLINE event handling

From: Anshuman Khandual
Date: Wed Sep 23 2020 - 00:45:19 EST




On 09/21/2020 05:35 PM, Anshuman Khandual wrote:
> This enables MEM_OFFLINE memory event handling. It will help intercept any
> possible error condition such as if boot memory some how still got offlined
> even after an explicit notifier failure, potentially by a future change in
> generic hot plug framework. This would help detect such scenarios and help
> debug further.
>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Steve Capper <steve.capper@xxxxxxx>
> Cc: Mark Brown <broonie@xxxxxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> ---
> arch/arm64/mm/mmu.c | 37 ++++++++++++++++++++++++++++++++-----
> 1 file changed, 32 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index df3b7415b128..6b171bd88bcf 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1482,13 +1482,40 @@ static int prevent_bootmem_remove_notifier(struct notifier_block *nb,
> unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> unsigned long pfn = arg->start_pfn;
>
> - if (action != MEM_GOING_OFFLINE)
> + if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
> return NOTIFY_OK;
>
> - for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> - ms = __pfn_to_section(pfn);
> - if (early_section(ms))
> - return NOTIFY_BAD;
> + if (action == MEM_GOING_OFFLINE) {
> + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> + ms = __pfn_to_section(pfn);
> + if (early_section(ms)) {
> + pr_warn("Boot memory offlining attempted\n");
> + return NOTIFY_BAD;
> + }
> + }
> + } else if (action == MEM_OFFLINE) {
> + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> + ms = __pfn_to_section(pfn);
> + if (early_section(ms)) {
> +
> + /*
> + * This should have never happened. Boot memory
> + * offlining should have been prevented by this
> + * very notifier. Probably some memory removal
> + * procedure might have changed which would then
> + * require further debug.
> + */
> + pr_err("Boot memory offlined\n");

It is returning in the first instance, when a section inside the
offline range happen to be part of the boot memory. So wondering
if it would be better to call out here, entire attempted offline
range or just the first section inside that which overlaps with
boot memory ? But some range information here will be helpful.