Re: [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO

From: Paul E. McKenney
Date: Fri Mar 08 2024 - 16:33:48 EST


On Thu, Mar 07, 2024 at 08:22:30PM -0800, Ankur Arora wrote:
>
> Paul E. McKenney <paulmck@xxxxxxxxxx> writes:
>
> > On Thu, Mar 07, 2024 at 07:15:35PM -0500, Joel Fernandes wrote:
> >>
> >>
> >> On 3/7/2024 2:01 PM, Paul E. McKenney wrote:
> >> > On Wed, Mar 06, 2024 at 03:42:10PM -0500, Joel Fernandes wrote:
> >> >> Hi Ankur,
> >> >>
> >> >> On 3/5/2024 3:11 AM, Ankur Arora wrote:
> >> >>>
> >> >>> Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> writes:
> >> >>>
> >> >> [..]
> >> >>>> IMO, just kill 'voluntary' if PREEMPT_AUTO is enabled. There is no
> >> >>>> 'voluntary' business because
> >> >>>> 1. The behavior vs =none is to allow higher scheduling class to preempt, it
> >> >>>> is not about the old voluntary.
> >> >>>
> >> >>> What do you think about folding the higher scheduling class preemption logic
> >> >>> into preempt=none? As Juri pointed out, prioritization of at least the leftmost
> >> >>> deadline task needs to be done for correctness.
> >> >>>
> >> >>> (That'll get rid of the current preempt=voluntary model, at least until
> >> >>> there's a separate use for it.)
> >> >>
> >> >> Yes I am all in support for that. Its less confusing for the user as well, and
> >> >> scheduling higher priority class at the next tick for preempt=none sounds good
> >> >> to me. That is still an improvement for folks using SCHED_DEADLINE for whatever
> >> >> reason, with a vanilla CONFIG_PREEMPT_NONE=y kernel. :-P. If we want a new mode
> >> >> that is more aggressive, it could be added in the future.
> >> >
> >> > This would be something that happens only after removing cond_resched()
> >> > might_sleep() functionality from might_sleep(), correct?
> >>
> >> Firstly, Maybe I misunderstood Ankur completely. Re-reading his comments above,
> >> he seems to be suggesting preempting instantly for higher scheduling CLASSES
> >> even for preempt=none mode, without having to wait till the next
> >> scheduling-clock interrupt. Not sure if that makes sense to me, I was asking not
> >> to treat "higher class" any differently than "higher priority" for preempt=none.
> >>
> >> And if SCHED_DEADLINE has a problem with that, then it already happens so with
> >> CONFIG_PREEMPT_NONE=y kernels, so no need special treatment for higher class any
> >> more than the treatment given to higher priority within same class. Ankur/Juri?
> >>
> >> Re: cond_resched(), I did not follow you Paul, why does removing the proposed
> >> preempt=voluntary mode (i.e. dropping this patch) have to happen only after
> >> cond_resched()/might_sleep() modifications?
> >
> > Because right now, one large difference between CONFIG_PREEMPT_NONE
> > an CONFIG_PREEMPT_VOLUNTARY is that for the latter might_sleep() is a
> > preemption point, but not for the former.
>
> True. But, there is no difference between either of those with
> PREEMPT_AUTO=y (at least right now).
>
> For (PREEMPT_AUTO=y, PREEMPT_VOLUNTARY=y, DEBUG_ATOMIC_SLEEP=y),
> might_sleep() is:
>
> # define might_resched() do { } while (0)
> # define might_sleep() \
> do { __might_sleep(__FILE__, __LINE__); might_resched(); } while (0)
>
> And, cond_resched() for (PREEMPT_AUTO=y, PREEMPT_VOLUNTARY=y,
> DEBUG_ATOMIC_SLEEP=y):
>
> static inline int _cond_resched(void)
> {
> klp_sched_try_switch();
> return 0;
> }
> #define cond_resched() ({ \
> __might_resched(__FILE__, __LINE__, 0); \
> _cond_resched(); \
> })
>
> And, no change for (PREEMPT_AUTO=y, PREEMPT_NONE=y, DEBUG_ATOMIC_SLEEP=y).

As long as it is easy to restore the prior cond_resched() functionality
for testing in the meantime, I should be OK. For example, it would
be great to have the commit removing the old functionality from
cond_resched() at the end of the series,

Thanx, Paul