Re: [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO

From: Paul E. McKenney
Date: Thu Mar 07 2024 - 19:42:47 EST


On Thu, Mar 07, 2024 at 07:15:35PM -0500, Joel Fernandes wrote:
>
>
> On 3/7/2024 2:01 PM, Paul E. McKenney wrote:
> > On Wed, Mar 06, 2024 at 03:42:10PM -0500, Joel Fernandes wrote:
> >> Hi Ankur,
> >>
> >> On 3/5/2024 3:11 AM, Ankur Arora wrote:
> >>>
> >>> Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> writes:
> >>>
> >> [..]
> >>>> IMO, just kill 'voluntary' if PREEMPT_AUTO is enabled. There is no
> >>>> 'voluntary' business because
> >>>> 1. The behavior vs =none is to allow higher scheduling class to preempt, it
> >>>> is not about the old voluntary.
> >>>
> >>> What do you think about folding the higher scheduling class preemption logic
> >>> into preempt=none? As Juri pointed out, prioritization of at least the leftmost
> >>> deadline task needs to be done for correctness.
> >>>
> >>> (That'll get rid of the current preempt=voluntary model, at least until
> >>> there's a separate use for it.)
> >>
> >> Yes I am all in support for that. Its less confusing for the user as well, and
> >> scheduling higher priority class at the next tick for preempt=none sounds good
> >> to me. That is still an improvement for folks using SCHED_DEADLINE for whatever
> >> reason, with a vanilla CONFIG_PREEMPT_NONE=y kernel. :-P. If we want a new mode
> >> that is more aggressive, it could be added in the future.
> >
> > This would be something that happens only after removing cond_resched()
> > might_sleep() functionality from might_sleep(), correct?
>
> Firstly, Maybe I misunderstood Ankur completely. Re-reading his comments above,
> he seems to be suggesting preempting instantly for higher scheduling CLASSES
> even for preempt=none mode, without having to wait till the next
> scheduling-clock interrupt. Not sure if that makes sense to me, I was asking not
> to treat "higher class" any differently than "higher priority" for preempt=none.
>
> And if SCHED_DEADLINE has a problem with that, then it already happens so with
> CONFIG_PREEMPT_NONE=y kernels, so no need special treatment for higher class any
> more than the treatment given to higher priority within same class. Ankur/Juri?
>
> Re: cond_resched(), I did not follow you Paul, why does removing the proposed
> preempt=voluntary mode (i.e. dropping this patch) have to happen only after
> cond_resched()/might_sleep() modifications?

Because right now, one large difference between CONFIG_PREEMPT_NONE
an CONFIG_PREEMPT_VOLUNTARY is that for the latter might_sleep() is a
preemption point, but not for the former.

But if might_sleep() becomes debug-only, then there will no longer be
this difference.

Thanx, Paul