Re: [RFC PATCH] mm,memory_hotplug: Unlock 1GB-hugetlb on x86_64

From: Michal Hocko
Date: Thu Feb 28 2019 - 04:16:05 EST


On Thu 28-02-19 08:38:34, David Hildenbrand wrote:
> On 27.02.19 23:00, Mike Kravetz wrote:
> > On 2/27/19 1:51 PM, Oscar Salvador wrote:
> >> On Thu, Feb 21, 2019 at 10:42:12AM +0100, Oscar Salvador wrote:
> >>> [1] https://lore.kernel.org/patchwork/patch/998796/
> >>>
> >>> Signed-off-by: Oscar Salvador <osalvador@xxxxxxx>
> >>
> >> Any further comments on this?
> >> I do have a "concern" I would like to sort out before dropping the RFC:
> >>
> >> It is the fact that unless we have spare gigantic pages in other notes, the
> >> offlining operation will loop forever (until the customer cancels the operation).
> >> While I do not really like that, I do think that memory offlining should be done
> >> with some sanity, and the administrator should know in advance if the system is going
> >> to be able to keep up with the memory pressure, aka: make sure we got what we need in
> >> order to make the offlining operation to succeed.
> >> That translates to be sure that we have spare gigantic pages and other nodes
> >> can take them.
> >>
> >> Given said that, another thing I thought about is that we could check if we have
> >> spare gigantic pages at has_unmovable_pages() time.
> >> Something like checking "h->free_huge_pages - h->resv_huge_pages > 0", and if it
> >> turns out that we do not have gigantic pages anywhere, just return as we have
> >> non-movable pages.
> >
> > Of course, that check would be racy. Even if there is an available gigantic
> > page at has_unmovable_pages() time there is no guarantee it will be there when
> > we want to allocate/use it. But, you would at least catch 'most' cases of
> > looping forever.
> >
> >> But I would rather not convulate has_unmovable_pages() with such checks and "trust"
> >> the administrator.
>
> I think we have the exact same issue already with huge/ordinary pages if
> we are low on memory. We could loop forever.
>
> In the long run, we should properly detect such issues and abort instead
> of looping forever I guess. But as we all know, error handling in the
> whole offlining part is still far away from being perfect ...

Migration allocation callbacks use __GFP_RETRY_MAYFAIL to not
be disruptive so they do not trigger the OOM killer and rely on somebody
else to pull the trigger instead. This means that if there is no other
activity on the system the hotplug migration would just loop for ever
or until interrupted by the userspace. THe later is important, user
might define a policy when to terminate and keep retrying is not
necessarily a wrong thing. One can simply do
timeout $TIMEOUT echo 0 > $PATH_TO_MEMBLOCK/online

and ENOMEM handling is not important. But I can see how people might
want to bail out early instead. So I do not have a strong opinion here.
We can try to consider ENOMEM from the migration as a hard failure and
bail out and see whether it works in practice.
--
Michal Hocko
SUSE Labs