Re: [RFC PATCH 00/86] Make the kernel preemptible

From: Vlastimil Babka
Date: Wed Nov 08 2023 - 02:54:30 EST


On 11/8/23 06:12, Steven Rostedt wrote:
> On Tue, 7 Nov 2023 20:52:39 -0800 (PST)
> Christoph Lameter <cl@xxxxxxxxx> wrote:
>
>> On Tue, 7 Nov 2023, Ankur Arora wrote:
>>
>> > This came up in an earlier discussion (See
>> > https://lore.kernel.org/lkml/87cyyfxd4k.ffs@tglx/) and Thomas mentioned
>> > that preempt_enable/_disable() overhead was relatively minimal.
>> >
>> > Is your point that always-on preempt_count is far too expensive?
>>
>> Yes over the years distros have traditionally delivered their kernels by
>> default without preemption because of these issues. If the overhead has
>> been minimized then that may have changed. Even if so there is still a lot
>> of code being generated that has questionable benefit and just
>> bloats the kernel.
>>
>> >> These are needed to avoid adding preempt_enable/disable to a lot of primitives
>> >> that are used for synchronization. You cannot remove those without changing a
>> >> lot of synchronization primitives to always have to consider being preempted
>> >> while operating.
>> >
>> > I'm afraid I don't understand why you would need to change any
>> > synchronization primitives. The code that does preempt_enable/_disable()
>> > is compiled out because CONFIG_PREEMPT_NONE/_VOLUNTARY don't define
>> > CONFIG_PREEMPT_COUNT.
>>
>> In the trivial cases it is simple like that. But look f.e.
>> in the slub allocator at the #ifdef CONFIG_PREEMPTION section. There is a
>> overhead added to be able to allow the cpu to change under us. There are
>> likely other examples in the source.
>>
>
> preempt_disable() and preempt_enable() are much lower overhead today than
> it use to be.
>
> If you are worried about changing CPUs, there's also migrate_disable() too.

Note that while migrate_disable() would be often sufficient, the
implementation of it has actually more overhead (function call, does
preempt_disable()/enable() as part of it) than just preempt_disable(). See
for example the pcpu_task_pin() definition in mm/page_alloc.c