Re: [RFC PATCH 00/86] Make the kernel preemptible

From: Ankur Arora
Date: Tue Nov 07 2023 - 23:34:55 EST



Christoph Lameter <cl@xxxxxxxxx> writes:

> The kernel is not preemptible???? What are you smoking?

The title admittedly is a little tongue in check but the point was that
a kernel under a voluntary preemption model isn't preemptible. That's
what this series attempts to do. Essentially enable PREEMPT_COUNT and
PREEMPTION for all preemption models.

PREEMPT_COUNT is always enabled with PREEMPT_DYNAMIC as well. There the
approach is to toggle which preemption points are used dynamically.
Here the idea is to not have statically placed preemption points and let
the scheduler decide when preemption is warranted.
And the only way to safely do that is by having PREEMPT_COUNT=y.

>> In voluntary models, the scheduler's job is to match the demand
>> side of preemption points (a task that needs to be scheduled) with
>> the supply side (a task which calls cond_resched().)
>
> Voluntary preemption models are important for code optimization because the code
> can rely on the scheduler not changing the cpu we are running on. This allows
> removing code for preempt_enable/disable to be removed from the code and allows
> better code generation. The best performing code is generated with defined
> preemption points when we have a guarantee that the code is not being
> rescheduled on a different processor. This is f.e. important for consistent
> access to PER CPU areas.

Right. This necessitates preempt_enable/preempt_disable() so you get
consistent access to the CPU.

This came up in an earlier discussion (See
https://lore.kernel.org/lkml/87cyyfxd4k.ffs@tglx/) and Thomas mentioned
that preempt_enable/_disable() overhead was relatively minimal.

Is your point that always-on preempt_count is far too expensive?

>> To do this add a new flag, TIF_NEED_RESCHED_LAZY which allows the
>> scheduler to mark that a reschedule is needed, but is deferred until
>> the task finishes executing in the kernel -- voluntary preemption
>> as it were.
>
> That is different from the current no preemption model? Seems to be the same.
>> There's just one remaining issue: now that explicit preemption points are
>> gone, processes that spread a long time in the kernel have no way to give
>> up the CPU.
>
> These are needed to avoid adding preempt_enable/disable to a lot of primitives
> that are used for synchronization. You cannot remove those without changing a
> lot of synchronization primitives to always have to consider being preempted
> while operating.

I'm afraid I don't understand why you would need to change any
synchronization primitives. The code that does preempt_enable/_disable()
is compiled out because CONFIG_PREEMPT_NONE/_VOLUNTARY don't define
CONFIG_PREEMPT_COUNT.

The intent here is to always have CONFIG_PREEMPT_COUNT=y.

--
ankur