Re: [RFC PATCH 43/86] sched: enable PREEMPT_COUNT, PREEMPTION for all preemption models

From: Peter Zijlstra
Date: Wed Nov 08 2023 - 04:59:04 EST


On Tue, Nov 07, 2023 at 01:57:29PM -0800, Ankur Arora wrote:
> The scheduler uses PREEMPT_COUNT and PREEMPTION to drive
> preemption: the first to demarcate non-preemptible sections and
> the second for the actual mechanics of preemption.
>
> Enable both for voluntary preemption models.
>
> In addition, define a new scheduler feature FORCE_PREEMPT which
> can now be used to distinguish between voluntary and full
> preemption models at runtime.
>
> Originally-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx>
> ---
> init/Makefile | 2 +-
> kernel/Kconfig.preempt | 12 ++++++++----
> kernel/entry/common.c | 3 +--
> kernel/sched/core.c | 26 +++++++++++---------------
> kernel/sched/features.h | 6 ++++++
> 5 files changed, 27 insertions(+), 22 deletions(-)
>
> diff --git a/init/Makefile b/init/Makefile
> index 385fd80fa2ef..99e480f24cf3 100644
> --- a/init/Makefile
> +++ b/init/Makefile
> @@ -24,7 +24,7 @@ mounts-$(CONFIG_BLK_DEV_INITRD) += do_mounts_initrd.o
> #
>
> smp-flag-$(CONFIG_SMP) := SMP
> -preempt-flag-$(CONFIG_PREEMPT) := PREEMPT
> +preempt-flag-$(CONFIG_PREEMPTION) := PREEMPT_DYNAMIC
> preempt-flag-$(CONFIG_PREEMPT_RT) := PREEMPT_RT
>
> build-version = $(or $(KBUILD_BUILD_VERSION), $(build-version-auto))
> diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
> index aa87b5cd3ecc..074fe5e253b5 100644
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -6,20 +6,23 @@ choice
>
> config PREEMPT_NONE
> bool "No Forced Preemption (Server)"
> + select PREEMPTION
> help
> This is the traditional Linux preemption model, geared towards
> throughput. It will still provide good latencies most of the
> - time, but there are no guarantees and occasional longer delays
> - are possible.
> + time, but occasional delays are possible.
>
> Select this option if you are building a kernel for a server or
> scientific/computation system, or if you want to maximize the
> raw processing power of the kernel, irrespective of scheduling
> - latencies.
> + latencies. Unless your architecture actively disables preemption,
> + you can always switch to one of the other preemption models
> + at runtime.


> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 6433e6c77185..f7f2efabb5b5 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -422,8 +422,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
> }
>
> instrumentation_begin();
> - if (IS_ENABLED(CONFIG_PREEMPTION))
> - irqentry_exit_cond_resched();
> + irqentry_exit_cond_resched();
> /* Covers both tracing and lockdep */
> trace_hardirqs_on();
> instrumentation_end();

I'm totally confused by the PREEMPT_NONE changes here. How does that
make sense?