Re: [PATCH 0/1] mm: Remove the SLAB allocator

From: Michal Hocko
Date: Wed Apr 17 2019 - 09:38:58 EST

Next message: Vlastimil Babka: "Re: [PATCH 0/3] vmalloc enhancements"
Previous message: Vlastimil Babka: "Re: [PATCH 3/3] mm: show number of vmalloc pages in /proc/meminfo"
In reply to: Christopher Lameter: "Re: [PATCH 0/1] mm: Remove the SLAB allocator"
Next in thread: Jesper Dangaard Brouer: "Re: [PATCH 0/1] mm: Remove the SLAB allocator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote:
> On Thu, 11 Apr 2019 11:27:26 +0300
> Pekka Enberg <penberg@xxxxxx> wrote:
>
> > Hi,
> >
> > On 4/11/19 10:55 AM, Michal Hocko wrote:
> > > Please please have it more rigorous then what happened when SLUB was
> > > forced to become a default
> >
> > This is the hard part.
> >
> > Even if you are able to show that SLUB is as fast as SLAB for all the
> > benchmarks you run, there's bound to be that one workload where SLUB
> > regresses. You will then have people complaining about that (rightly so)
> > and you're again stuck with two allocators.
> >
> > To move forward, I think we should look at possible *pathological* cases
> > where we think SLAB might have an advantage. For example, SLUB had much
> > more difficulties with remote CPU frees than SLAB. Now I don't know if
> > this is the case, but it should be easy to construct a synthetic
> > benchmark to measure this.
>
> I do think SLUB have a number of pathological cases where SLAB is
> faster. If was significantly more difficult to get good bulk-free
> performance for SLUB. SLUB is only fast as long as objects belong to
> the same page. To get good bulk-free performance if objects are
> "mixed", I coded this[1] way-too-complex fast-path code to counter
> act this (joined work with Alex Duyck).
>
> [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113

How often is this a real problem for real workloads?

> > For example, have a userspace process that does networking, which is
> > often memory allocation intensive, so that we know that SKBs traverse
> > between CPUs. You can do this by making sure that the NIC queues are
> > mapped to CPU N (so that network softirqs have to run on that CPU) but
> > the process is pinned to CPU M.
>
> If someone want to test this with SKBs then be-aware that we netdev-guys
> have a number of optimizations where we try to counter act this. (As
> minimum disable TSO and GRO).
>
> It might also be possible for people to get inspired by and adapt the
> micro benchmarking[2] kernel modules that I wrote when developing the
> SLUB and SLAB optimizations:
>
> [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm

While microbenchmarks are good to see pathological behavior, I would be
really interested to see some numbers for real world usecases.

> > It's, of course, worth thinking about other pathological cases too.
> > Workloads that cause large allocations is one. Workloads that cause lots
> > of slab cache shrinking is another.
>
> I also worry about long uptimes when SLUB objects/pages gets too
> fragmented... as I said SLUB is only efficient when objects are
> returned to the same page, while SLAB is not.

Is this something that has been actually measured in a real deployment?
--
Michal Hocko
SUSE Labs

Next message: Vlastimil Babka: "Re: [PATCH 0/3] vmalloc enhancements"
Previous message: Vlastimil Babka: "Re: [PATCH 3/3] mm: show number of vmalloc pages in /proc/meminfo"
In reply to: Christopher Lameter: "Re: [PATCH 0/1] mm: Remove the SLAB allocator"
Next in thread: Jesper Dangaard Brouer: "Re: [PATCH 0/1] mm: Remove the SLAB allocator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]