Re: [RFC PATCH 00/86] Make the kernel preemptible

From: Ankur Arora
Date: Wed Nov 08 2023 - 15:28:22 EST



Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> On Wed, Nov 08 2023 at 11:13, Peter Zijlstra wrote:
>> On Wed, Nov 08, 2023 at 02:04:02AM -0800, Ankur Arora wrote:
>> I'm not understanding, those should stay obviously.
>>
>> The current preempt_dynamic stuff has 5 toggles:
>>
>> /*
>> * SC:cond_resched
>> * SC:might_resched
>> * SC:preempt_schedule
>> * SC:preempt_schedule_notrace
>> * SC:irqentry_exit_cond_resched
>> *
>> *
>> * NONE:
>> * cond_resched <- __cond_resched
>> * might_resched <- RET0
>> * preempt_schedule <- NOP
>> * preempt_schedule_notrace <- NOP
>> * irqentry_exit_cond_resched <- NOP
>> *
>> * VOLUNTARY:
>> * cond_resched <- __cond_resched
>> * might_resched <- __cond_resched
>> * preempt_schedule <- NOP
>> * preempt_schedule_notrace <- NOP
>> * irqentry_exit_cond_resched <- NOP
>> *
>> * FULL:
>> * cond_resched <- RET0
>> * might_resched <- RET0
>> * preempt_schedule <- preempt_schedule
>> * preempt_schedule_notrace <- preempt_schedule_notrace
>> * irqentry_exit_cond_resched <- irqentry_exit_cond_resched
>> */
>>
>> If you kill voluntary as we know it today, you can remove cond_resched
>> and might_resched, but the remaining 3 are still needed to switch
>> between NONE and FULL.
>
> No. The whole point of LAZY is to keep preempt_schedule(),
> preempt_schedule_notrace(), irqentry_exit_cond_resched() always enabled.
>
> Look at my PoC: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/
>
> The idea is to always enable preempt count and keep _all_ preemption
> points enabled.
>
> For NONE/VOLUNTARY mode let the scheduler set TIF_NEED_RESCHED_LAZY
> instead of TIF_NEED_RESCHED. In full mode set TIF_NEED_RESCHED.
>
> Here is where the regular and the lazy flags are evaluated:
>
> Ret2user Ret2kernel PreemptCnt=0 need_resched()
>
> NEED_RESCHED Y Y Y Y
> LAZY_RESCHED Y N N Y
>
> The trick is that LAZY is not folded into preempt_count so a 1->0
> counter transition won't cause preempt_schedule() to be invoked because
> the topmost bit (NEED_RESCHED) is set.
>
> The scheduler can still decide to set TIF_NEED_RESCHED which will cause
> an immediate preemption at the next preemption point.
>
> This allows to force out a task which loops, e.g. in a massive copy or
> clear operation, as it did not reach a point where TIF_NEED_RESCHED_LAZY
> is evaluated after a time which is defined by the scheduler itself.
>
> For my PoC I did:
>
> 1) Set TIF_NEED_RESCHED_LAZY
>
> 2) Set TIF_NEED_RESCHED when the task did not react on
> TIF_NEED_RESCHED_LAZY within a tick
>
> I know that's crude but it just works and obviously requires quite some
> refinement.
>
> So the way how you switch between preemption modes is to select when the
> scheduler sets TIF_NEED_RESCHED/TIF_NEED_RESCHED_LAZY. No static call
> switching at all.
>
> In full preemption mode it sets always TIF_NEED_RESCHED and otherwise it
> uses the LAZY bit first, grants some time and then gets out the hammer
> and sets TIF_NEED_RESCHED when the task did not reach a LAZY preemption
> point.
>
> Which means once the whole thing is in place then the whole
> PREEMPT_DYNAMIC along with NONE, VOLUNTARY, FULL can go away along with
> the cond_resched() hackery.
>
> So I think this series is backwards.
>
> It should add the LAZY muck with a Kconfig switch like I did in my PoC
> _first_. Once that is working and agreed on, the existing muck can be
> removed.

Yeah. I should have done it in the order in your PoC. Right now I'm
doing all of the stuff you describe above, but because there are far
too many structural changes, it's not clear to anybody what the code
is doing.

Okay, so for the next version let me limit the series to just the
scheduler changes which can be orthogonal to the old models (basically
a new scheduler model PREEMPT_AUTO).

Once that is agreed on, the other models can be removed (or expressed
in terms of PREEMPT_AUTO.)

--
ankur