Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc()

From: Mike Rapoport
Date: Thu May 18 2023 - 11:24:12 EST


On Wed, May 17, 2023 at 11:35:56PM -0400, Kent Overstreet wrote:
> On Wed, Mar 08, 2023 at 11:41:02AM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@xxxxxxxxxx>
> >
> > When set_memory or set_direct_map APIs used to change attribute or
> > permissions for chunks of several pages, the large PMD that maps these
> > pages in the direct map must be split. Fragmenting the direct map in such
> > manner causes TLB pressure and, eventually, performance degradation.
> >
> > To avoid excessive direct map fragmentation, add ability to allocate
> > "unmapped" pages with __GFP_UNMAPPED flag that will cause removal of the
> > allocated pages from the direct map and use a cache of the unmapped pages.
> >
> > This cache is replenished with higher order pages with preference for
> > PMD_SIZE pages when possible so that there will be fewer splits of large
> > pages in the direct map.
> >
> > The cache is implemented as a buddy allocator, so it can serve high order
> > allocations of unmapped pages.
>
> So I'm late to this discussion, I stumbled in because of my own run in
> with executable memory allocation.
>
> I understand that post LSF this patchset seems to not be going anywhere,
> but OTOH there's also been a desire for better executable memory
> allocation; as noted by tglx and elsewhere, there _is_ a definite
> performance impact on page size with kernel text - I've seen numbers in
> the multiple single digit percentage range in the past.
>
> This patchset does seem to me to be roughly the right approach for that,
> and coupled with the slab allocator for sub-page sized allocations it
> seems there's the potential for getting a nice interface that spans the
> full range of allocation sizes, from small bpf/trampoline allocations up
> to modules.
>
> Is this patchset worth reviving/continuing with? Was it really just the
> needed module refactoring that was the blocker?

As I see it, this patchset only one building block out of three? four?
If we are to repurpose it for code allocations it should be something like

1) allocate 2M page to fill the cache
2) remove this page from the direct map
3) map the 2M page ROX in module address space (usually some part of
vmalloc address space)
4) allocate a smaller chunk of that page to the actual caller (bpf,
modules, whatever)

Right now (3) and (4) won't work for modules because they mix code and data
in a single allocation.

So module refactoring is a blocker and another blocker is to teach vmalloc
to map the areas for the executable memory with 2M pages and probably
something else.

I remember there was an attempt to add VM_ALLOW_HUGE_VMAP to
x86::module_alloc(), but it caused problems and was reverted. Sorry, I
could not find the lore link.

There was a related discussion here:

https://lore.kernel.org/all/14D6DBA0-0572-44FB-A566-464B1FF541E0@xxxxxx/

--
Sincerely yours,
Mike.