Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

From: Ingo Molnar
Date: Tue Sep 19 2023 - 04:43:17 EST



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> > Yeah, the fact that we do presumably have PREEMPT_COUNT enabled in most
> > distros does speak for just admitting that the PREEMPT_NONE / VOLUNTARY
> > approach isn't actually used, and is only causing pain.
>
> The macro-behavior of NONE/VOLUNTARY is still used & relied upon in
> server distros - and that's the behavior that enterprise distros truly
> cared about.
>
> Micro-overhead of NONE/VOLUNTARY vs. FULL is nonzero but is in the
> 'noise' category for all major distros I'd say.
>
> And that's what Thomas's proposal achieves: keep the nicely
> execution-batched NONE/VOLUNTARY scheduling behavior for SCHED_OTHER
> tasks, while having the latency advantages of fully-preemptible kernel
> code for RT and critical tasks.
>
> So I'm fully on board with this. It would reduce the number of preemption
> variants to just two: regular kernel and PREEMPT_RT. Yummie!

As an additional side note: with various changes such as EEVDF the
scheduler is a lot less preemption-happy these days, without wrecking
latencies & timeslice distribution.

So in principle we might not even need the NEED_RESCHED_LAZY extra bit,
which -rt uses as a kind of additional layer to make sure they don't change
scheduling policy.

Ie. a modern scheduler might have mooted much of this change:

4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()")

... because now we'll only reschedule on timeslice exhaustion, or if a task
comes in with a big deadline deficit.

And even the deadline-deficit wakeup preemption can be turned off further
with:

$ echo NO_WAKEUP_PREEMPTION > /debug/sched/features

And we are considering making that the default behavior for same-prio tasks
- basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which
should be quite similar to what NEED_RESCHED_LAZY achieves on -rt.

Thanks,

Ingo