Re: [PATCH v2 1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage

From: Miaohe Lin
Date: Mon Sep 05 2022 - 23:00:08 EST


On 2022/9/5 14:21, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>
> HWPoisoned page is not supposed to be accessed once marked, but currently
> such accesses can happen during memory hotremove because do_migrate_range()
> can be called before dissolve_free_huge_pages() is called.
>
> Move dissolve_free_huge_pages() before scan_movable_pages(). Recently
> delayed dissolve has been implemented, so the dissolving can turn
> a hwpoisoned hugepage into 4kB hwpoison page, which memory hotplug can
> handle safely.

Yes, thanks for your work, Naoya. ;)

>
> Reported-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> ---
> mm/memory_hotplug.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index fad6d1f2262a..c24735d63b25 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1880,6 +1880,17 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>
> cond_resched();
>
> + /*
> + * Dissolve free hugepages in the memory block before doing
> + * offlining actually in order to make hugetlbfs's object
> + * counting consistent.
> + */
> + ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> + if (ret) {
> + reason = "failure to dissolve huge pages";
> + goto failed_removal_isolated;
> + }

This change has a side-effect. If hugetlb pages are in-use, dissolve_free_huge_pages() will always return -EBUSY
even if those pages can be migrated. So we fail to hotremove the memory even if they could be offlined.
Or am I miss something?

Thanks,
Miaohe Lin

> +
> ret = scan_movable_pages(pfn, end_pfn, &pfn);
> if (!ret) {
> /*
> @@ -1895,17 +1906,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> goto failed_removal_isolated;
> }
>
> - /*
> - * Dissolve free hugepages in the memory block before doing
> - * offlining actually in order to make hugetlbfs's object
> - * counting consistent.
> - */
> - ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> - if (ret) {
> - reason = "failure to dissolve huge pages";
> - goto failed_removal_isolated;
> - }
> -
> ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
>
> } while (ret);
>