Re: [PATCH v2] mm: hugetlb_vmemmap: provide stronger vmemmap allocation guarantees

From: David Rientjes
Date: Fri Apr 14 2023 - 20:47:35 EST


On Thu, 13 Apr 2023, Michal Hocko wrote:

> [...]
> > > > This is a theoretical concern. Freeing a 1G page requires 16M of free
> > > > memory. A machine might need to be reconfigured from one task to
> > > > another, and release a large number of 1G pages back to the system if
> > > > allocating 16M fails, the release won't work.
> > >
> > > This is really an important "detail" changelog should mention. While I
> > > am not really against that change I would much rather see that as a
> > > result of a real world fix rather than a theoretical concern. Mostly
> > > because a real life scenario would allow us to test the
> > > __GFP_RETRY_MAYFAIL effectivness. As that request might fail as well we
> > > just end up with a theoretical fix for a theoretical problem. Something
> > > that is easy to introduce but much harder to get rid of should we ever
> > > need to change __GFP_RETRY_MAYFAIL implementation for example.
> >
> > I will add this to changelog in v3. If __GFP_RETRY_MAYFAIL is
> > ineffective we will receive feedback once someone hits this problem.
>
> I do not remember anybody hitting this with the current __GFP_NORETRY.
> So arguably there is nothing to be fixed ATM.
>

I think we should still at least clear __GFP_NORETRY in this allocation:
to be able to free 1GB hugepages back to the system we'd like the page
allocator to at least exercise its normal order-0 allocation logic rather
than exempting it from retrying reclaim by opting into __GFP_NORETRY.

I'd agree with the analysis in
https://lore.kernel.org/linux-mm/YCafit5ruRJ+SL8I@xxxxxxxxxxxxxx/ that
either a cleared __GFP_NORETRY or a __GFP_RETRY_MAYFAIL makes logical
sense.

We really *do* want to free these hugepages back to the system and the
amount of memory freeing will always be more than the allocation for
struct page. The net result is more free memory.

If the allocation fails, we can't free 1GB back to the system on a
saturated node if our first reclaim attempt didn't allow these struct
pages to be allocated. Stranding 1GB in the hugetlb pool that no
userspace on the system can make use of at the time isn't very useful.