Re: [tip: sched/core] sched: Fix performance regression introduced by mm_cid

From: Sebastian Andrzej Siewior
Date: Fri Jun 23 2023 - 02:52:43 EST


On 2023-06-22 10:33:13 [-0400], Mathieu Desnoyers wrote:
>
> What was fundamentally wrong with the per-cpu caches before commit
> df323337e50 other than being non-RT friendly ? Was the only purpose of that
> commit to reduce the duration of preempt-off critical sections, or is there
> a bigger picture concern it was taking care of by introducing a global pool
> ?

There were memory allocations within preempt-disabled sections
introduced by get_cpu_ptr(). This didn't fly on PREEMPT_RT. After
looking at this on 2 node, 64 CPUs box I didn't get anywhere near
complete usage of the allocated buffers per-CPU buffers. It looked
wasteful.
Based on my testing back then, it looked sufficient to use a global
buffer.

> Introducing per-cpu memory pools, dealing with migration by giving entries
> back to the right cpu's pool, taking into account the cpu the entry belongs
> to, and use a per-cpu/lock-free data structure allowing lock-free push to
> give back an entry on a remote cpu should do the trick without locking, and
> without long preempt-off critical sections.
>
> The only downside I see for per-cpu memory pools is a slightly larger memory
> overhead on large multi-core systems. But is that really a concern ?

Yes, if the memory is left unused and can't be reclaimed if needed.

> What am I missing here ?

I added (tried to add) an people claimed that the SLUB allocated got
better over the years. Also on NUMA systems it might be better to use it
since the memory is NUMA local. The allocation is for 8KiB by default.
Is there a test-case/ benchmark I could try this vs kmalloc()?

> Thanks,
>
> Mathieu

Sebastian