Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance

From: David Hildenbrand
Date: Fri Jul 07 2023 - 07:30:20 EST


On 07.07.23 11:52, Ryan Roberts wrote:
On 07/07/2023 09:01, Huang, Ying wrote:
Ryan Roberts <ryan.roberts@xxxxxxx> writes:

Introduce FLEXIBLE_THP feature, which allows anonymous memory to be
allocated in large folios of a specified order. All pages of the large
folio are pte-mapped during the same page fault, significantly reducing
the number of page faults. The number of per-page operations (e.g. ref
counting, rmap management lru list management) are also significantly
reduced since those ops now become per-folio.

I likes the idea to share as much code as possible between large
(anonymous) folio and THP. Finally, THP becomes just a special kind of
large folio.

Although we can use smaller page order for FLEXIBLE_THP, it's hard to
avoid internal fragmentation completely. So, I think that finally we
will need to provide a mechanism for the users to opt out, e.g.,
something like "always madvise never" via
/sys/kernel/mm/transparent_hugepage/enabled. I'm not sure whether it's
a good idea to reuse the existing interface of THP.

I wouldn't want to tie this to the existing interface, simply because that
implies that we would want to follow the "always" and "madvise" advice too; That
means that on a thp=madvise system (which is certainly the case for android and
other client systems) we would have to disable large anon folios for VMAs that
haven't explicitly opted in. That breaks the intention that this should be an
invisible performance boost. I think it's important to set the policy for use of

It will never ever be a completely invisible performance boost, just like ordinary THP.

Using the exact same existing toggle is the right thing to do. If someone specify "never" or "madvise", then do exactly that.

It might make sense to have more modes or additional toggles, but "madvise=never" means no memory waste.


I remember I raised it already in the past, but you *absolutely* have to respect the MADV_NOHUGEPAGE flag. There is user space out there (for example, userfaultfd) that doesn't want the kernel to populate any additional page tables. So if you have to respect that already, then also respect MADV_HUGEPAGE, simple.

THP separately to use of large anon folios.

I could be persuaded on the merrits of a new runtime enable/disable interface if
there is concensus.

There would have to be very good reason for a completely separate control. Bypassing MADV_NOHUGEPAGE or "madvise=never" simply because we add a "flexible" before the THP sounds broken.

--
Cheers,

David / dhildenb