Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

From: Dave Hansen
Date: Tue Sep 20 2016 - 13:43:20 EST


On 09/20/2016 08:52 AM, Rui Teng wrote:
> On 9/20/16 10:53 PM, Dave Hansen wrote:
...
>> That's good, but aren't we still left with a situation where we've
>> offlined and dissolved the _middle_ of a gigantic huge page while the
>> head page is still in place and online?
>>
>> That seems bad.
>>
> What about refusing to change the status for such memory block, if it
> contains a huge page which larger than itself? (function
> memory_block_action())

How will this be visible to users, though? That sounds like you simply
won't be able to offline memory with gigantic huge pages.

> I think it will not affect the hot-plug function too much. We can
> change the nr_hugepages to zero first, if we really want to hot-plug a
> memory.

Is that really feasible? Suggest that folks stop using hugetlbfs before
offlining any memory? Isn't the entire point of hotplug to keep the
system running while you change the memory present? Doing this would
require that you stop your applications that are using huge pages.

With gigantic pages, you may also never get them back if you do this.

> And I also found that the __test_page_isolated_in_pageblock() function
> can not handle a gigantic page well. It will cause a device busy error
> later. I am still investigating on that.
>
> Any suggestion?

It sounds like the _first_ offline operation needs to dissolve an
_entire_ page if that page has any portion in the section being
offlined. I'm not quite sure where the page should live after that, but
I'm not sure of any other way to do this sanely.