Re: [RFC PATCH 00/86] Make the kernel preemptible
From: Thomas Gleixner
Date: Wed Nov 08 2023 - 10:38:21 EST
On Wed, Nov 08 2023 at 11:13, Peter Zijlstra wrote:
> On Wed, Nov 08, 2023 at 02:04:02AM -0800, Ankur Arora wrote:
> I'm not understanding, those should stay obviously.
>
> The current preempt_dynamic stuff has 5 toggles:
>
> /*
> * SC:cond_resched
> * SC:might_resched
> * SC:preempt_schedule
> * SC:preempt_schedule_notrace
> * SC:irqentry_exit_cond_resched
> *
> *
> * NONE:
> * cond_resched <- __cond_resched
> * might_resched <- RET0
> * preempt_schedule <- NOP
> * preempt_schedule_notrace <- NOP
> * irqentry_exit_cond_resched <- NOP
> *
> * VOLUNTARY:
> * cond_resched <- __cond_resched
> * might_resched <- __cond_resched
> * preempt_schedule <- NOP
> * preempt_schedule_notrace <- NOP
> * irqentry_exit_cond_resched <- NOP
> *
> * FULL:
> * cond_resched <- RET0
> * might_resched <- RET0
> * preempt_schedule <- preempt_schedule
> * preempt_schedule_notrace <- preempt_schedule_notrace
> * irqentry_exit_cond_resched <- irqentry_exit_cond_resched
> */
>
> If you kill voluntary as we know it today, you can remove cond_resched
> and might_resched, but the remaining 3 are still needed to switch
> between NONE and FULL.
No. The whole point of LAZY is to keep preempt_schedule(),
preempt_schedule_notrace(), irqentry_exit_cond_resched() always enabled.
Look at my PoC: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/
The idea is to always enable preempt count and keep _all_ preemption
points enabled.
For NONE/VOLUNTARY mode let the scheduler set TIF_NEED_RESCHED_LAZY
instead of TIF_NEED_RESCHED. In full mode set TIF_NEED_RESCHED.
Here is where the regular and the lazy flags are evaluated:
Ret2user Ret2kernel PreemptCnt=0 need_resched()
NEED_RESCHED Y Y Y Y
LAZY_RESCHED Y N N Y
The trick is that LAZY is not folded into preempt_count so a 1->0
counter transition won't cause preempt_schedule() to be invoked because
the topmost bit (NEED_RESCHED) is set.
The scheduler can still decide to set TIF_NEED_RESCHED which will cause
an immediate preemption at the next preemption point.
This allows to force out a task which loops, e.g. in a massive copy or
clear operation, as it did not reach a point where TIF_NEED_RESCHED_LAZY
is evaluated after a time which is defined by the scheduler itself.
For my PoC I did:
1) Set TIF_NEED_RESCHED_LAZY
2) Set TIF_NEED_RESCHED when the task did not react on
TIF_NEED_RESCHED_LAZY within a tick
I know that's crude but it just works and obviously requires quite some
refinement.
So the way how you switch between preemption modes is to select when the
scheduler sets TIF_NEED_RESCHED/TIF_NEED_RESCHED_LAZY. No static call
switching at all.
In full preemption mode it sets always TIF_NEED_RESCHED and otherwise it
uses the LAZY bit first, grants some time and then gets out the hammer
and sets TIF_NEED_RESCHED when the task did not reach a LAZY preemption
point.
Which means once the whole thing is in place then the whole
PREEMPT_DYNAMIC along with NONE, VOLUNTARY, FULL can go away along with
the cond_resched() hackery.
So I think this series is backwards.
It should add the LAZY muck with a Kconfig switch like I did in my PoC
_first_. Once that is working and agreed on, the existing muck can be
removed.
Thanks,
tglx