Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server

From: Joel Fernandes
Date: Mon Nov 06 2023 - 14:32:19 EST


Hi Daniel,

On Sat, Nov 4, 2023 at 6:59 AM Daniel Bristot de Oliveira
<bristot@xxxxxxxxxx> wrote:
>
> Among the motivations for the DL servers is the real-time throttling
> mechanism. This mechanism works by throttling the rt_rq after
> running for a long period without leaving space for fair tasks.
>
> The base dl server avoids this problem by boosting fair tasks instead
> of throttling the rt_rq. The point is that it boosts without waiting
> for potential starvation, causing some non-intuitive cases.
>
> For example, an IRQ dispatches two tasks on an idle system, a fair
> and an RT. The DL server will be activated, running the fair task
> before the RT one. This problem can be avoided by deferring the
> dl server activation.
>
> By setting the zerolax option, the dl_server will dispatch an
> SCHED_DEADLINE reservation with replenished runtime, but throttled.
>
> The dl_timer will be set for (period - runtime) ns from start time.
> Thus boosting the fair rq on its 0-laxity time with respect to
> rt_rq.
>
> If the fair scheduler has the opportunity to run while waiting
> for zerolax time, the dl server runtime will be consumed. If
> the runtime is completely consumed before the zerolax time, the
> server will be replenished while still in a throttled state. Then,
> the dl_timer will be reset to the new zerolax time
>
> If the fair server reaches the zerolax time without consuming
> its runtime, the server will be boosted, following CBS rules
> (thus without breaking SCHED_DEADLINE).
>
> Signed-off-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> ---
> include/linux/sched.h | 2 +
> kernel/sched/deadline.c | 100 +++++++++++++++++++++++++++++++++++++++-
> kernel/sched/fair.c | 3 ++
> 3 files changed, 103 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5ac1f252e136..56e53e6fd5a0 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -660,6 +660,8 @@ struct sched_dl_entity {
> unsigned int dl_non_contending : 1;
> unsigned int dl_overrun : 1;
> unsigned int dl_server : 1;
> + unsigned int dl_zerolax : 1;
> + unsigned int dl_zerolax_armed : 1;
>
> /*
> * Bandwidth enforcement timer. Each -deadline task has its
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 1d7b96ca9011..69ee1fbd60e4 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -772,6 +772,14 @@ static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se,
> /* for non-boosted task, pi_of(dl_se) == dl_se */
> dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline;
> dl_se->runtime = pi_of(dl_se)->dl_runtime;
> +
> + /*
> + * If it is a zerolax reservation, throttle it.
> + */
> + if (dl_se->dl_zerolax) {
> + dl_se->dl_throttled = 1;
> + dl_se->dl_zerolax_armed = 1;
> + }
> }
>
> /*
> @@ -828,6 +836,7 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
> * could happen are, typically, a entity voluntarily trying to overcome its
> * runtime, or it just underestimated it during sched_setattr().
> */
> +static int start_dl_timer(struct sched_dl_entity *dl_se);
> static void replenish_dl_entity(struct sched_dl_entity *dl_se)
> {
> struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> @@ -874,6 +883,28 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
> dl_se->dl_yielded = 0;
> if (dl_se->dl_throttled)
> dl_se->dl_throttled = 0;
> +
> + /*
> + * If this is the replenishment of a zerolax reservation,
> + * clear the flag and return.
> + */
> + if (dl_se->dl_zerolax_armed) {
> + dl_se->dl_zerolax_armed = 0;
> + return;
> + }
> +
> + /*
> + * A this point, if the zerolax server is not armed, and the deadline
> + * is in the future, throttle the server and arm the zerolax timer.
> + */
> + if (dl_se->dl_zerolax &&
> + dl_time_before(dl_se->deadline - dl_se->runtime, rq_clock(rq))) {
> + if (!is_dl_boosted(dl_se)) {
> + dl_se->dl_zerolax_armed = 1;
> + dl_se->dl_throttled = 1;
> + start_dl_timer(dl_se);
> + }
> + }
> }
>
> /*
> @@ -1024,6 +1055,13 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
> }
>
> replenish_dl_new_period(dl_se, rq);
> + } else if (dl_server(dl_se) && dl_se->dl_zerolax) {
> + /*
> + * The server can still use its previous deadline, so throttle
> + * and arm the zero-laxity timer.
> + */
> + dl_se->dl_zerolax_armed = 1;
> + dl_se->dl_throttled = 1;
> }
> }
>
> @@ -1056,8 +1094,20 @@ static int start_dl_timer(struct sched_dl_entity *dl_se)
> * We want the timer to fire at the deadline, but considering
> * that it is actually coming from rq->clock and not from
> * hrtimer's time base reading.
> + *
> + * The zerolax reservation will have its timer set to the
> + * deadline - runtime. At that point, the CBS rule will decide
> + * if the current deadline can be used, or if a replenishment
> + * is required to avoid add too much pressure on the system
> + * (current u > U).
> */
> - act = ns_to_ktime(dl_next_period(dl_se));
> + if (dl_se->dl_zerolax_armed) {
> + WARN_ON_ONCE(!dl_se->dl_throttled);
> + act = ns_to_ktime(dl_se->deadline - dl_se->runtime);

Just a question, here if dl_se->deadline - dl_se->runtime is large,
then does that mean that server activation will be much more into the
future? So say I want to give CFS 30%, then it will take 70% of the
period before CFS preempts RT thus "starving" CFS for this duration. I
think that's Ok for smaller periods and runtimes, though.

I think it does reserve the amount of required CFS bandwidth so it is
probably OK, though it is perhaps letting RT run more initially (say
if CFS tasks are not CPU bound and occasionally wake up, they will
always be hit by the 70% latency AFAICS which may be large for large
periods and small runtimes).

I/we're currently trying these patches on ChromeOS as well.

Just started going over it to understand the patch. Looking nice so
far and thanks,

- Joel