Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc()

From: Song Liu
Date: Thu May 18 2023 - 16:52:29 EST


On Thu, May 18, 2023 at 12:15 PM Kent Overstreet
<kent.overstreet@xxxxxxxxx> wrote:
>
> On Thu, May 18, 2023 at 12:03:03PM -0700, Song Liu wrote:
> > On Thu, May 18, 2023 at 11:47 AM Song Liu <song@xxxxxxxxxx> wrote:
> > >
> > > On Thu, May 18, 2023 at 10:24 AM Kent Overstreet
> > > <kent.overstreet@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, May 18, 2023 at 10:00:39AM -0700, Song Liu wrote:
> > > > > On Thu, May 18, 2023 at 9:48 AM Kent Overstreet
> > > > > <kent.overstreet@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Thu, May 18, 2023 at 09:33:20AM -0700, Song Liu wrote:
> > > > > > > I am working on patches based on the discussion in [1]. I am planning to
> > > > > > > send v1 for review in a week or so.
> > > > > >
> > > > > > Hey Song, I was reviewing that thread too,
> > > > > >
> > > > > > Are you taking a different approach based on Thomas's feedback? I think
> > > > > > he had some fair points in that thread.
> > > > >
> > > > > Yes, the API is based on Thomas's suggestion, like 90% from the discussions.
> > > > >
> > > > > >
> > > > > > My own feeling is that the buddy allocator is our tool for allocating
> > > > > > larger variable sized physically contiguous allocations, so I'd like to
> > > > > > see something based on that - I think we could do a hybrid buddy/slab
> > > > > > allocator approach, like we have for regular memory allocations.
> > > > >
> > > > > I am planning to implement the allocator based on this (reuse
> > > > > vmap_area logic):
> > > >
> > > > Ah, you're still doing vmap_area approach.
> > > >
> > > > Mike's approach looks like it'll be _much_ lighter weight and higher
> > > > performance, to me. vmalloc is known to be slow compared to the buddy
> > > > allocator, and with Mike's approach we're only modifying mappings once
> > > > per 2 MB chunk.
> > > >
> > > > I don't see anything in your code for sub-page sized allocations too, so
> > > > perhaps I should keep going with my slab allocator.
> > >
> > > The vmap_area approach handles sub-page allocations. In 5/5 of set [2],
> > > we showed that multiple BPF programs share the same page with some
> > > kernel text (_etext).
> > >
> > > > Could you share your thoughts on your approach vs. Mike's? I'm newer to
> > > > this area of the code than you two so maybe there's an angle I've missed
> > > > :)
> > >
> > > AFAICT, tree based solution (vmap_area) is more efficient than bitmap
> > > based solution.
>
> Tree based requires quite a bit of overhead for the rbtree pointers, and
> additional vmap_area structs.
>
> With a buddy allocator based approach, there's no additional state that
> needs to be allocated, since it all fits in struct page.

To allocate memory for text, we will allocate 2MiB, make it ROX, and then
use it for many small allocations. IIUC, buddy allocator will use unallocated
parts of this page for metadata. I guess this may be a problem, as the
whole page is ROX now, and we have to use text_poke to write to it.

OTOH, if we allocate extra memory for metadata (tree based solution),
all the metadata operations can be regular read/write.

Thanks,
Song