Re: [PATCH RFC] sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks

From: Peter Zijlstra
Date: Thu May 04 2017 - 10:18:12 EST


On Mon, Apr 24, 2017 at 05:18:35PM +0200, Daniel Bristot de Oliveira wrote:
> We have been facing some problems with self-suspending constrained
> deadline tasks. The main reason is that the original CBS was not
> designed for such sort of tasks.
>
> One problem reported by Xunlei Pang takes place when a task
> suspends, and then is awakened before the deadline, but so close
> to the deadline that its remaining runtime can cause the task
> to have an absolute density higher than allowed. In such situation,
> the original CBS assumes that the task is facing an early activation,
> and so it replenishes the task and set another deadline, one deadline
> in the future. This rule works fine for implicit deadline tasks.
> Moreover, it allows the system to adapt the period of a task in which
> the external event source suffered from a clock drift.
>
> However, this opens the window for bandwidth leakage for constrained
> deadline tasks. For instance, a task with the following parameters:
>
> runtime = 5 ms
> deadline = 7 ms
> [density] = 5 / 7 = 0.71
> period = 1000 ms
>
> If the task runs for 1 ms, and then suspends for another 1ms,
> it will be awakened with the following parameters:
>
> remaining runtime = 4
> laxity = 5
>
> presenting a absolute density of 4 / 5 = 0.80.
>
> In this case, the original CBS would assume the task had an early
> wakeup. Then, CBS will reset the runtime, and the absolute deadline will
> be postponed by one relative deadline, allowing the task to run.
>
> The problem is that, if the task runs this pattern forever, it will keep
> receiving bandwidth, being able to run 1ms every 2ms. Following this
> behavior, the task would be able to run 500 ms in 1 sec. Thus running
> more than the 5 ms / 1 sec the admission control allowed it to run.
>
> Trying to address the self-suspending case, Luca Abeni, Giuseppe
> Lipari, and Juri Lelli [1] revisited the CBS in order to deal with
> self-suspending tasks. In the new approach, rather than
> replenishing/postponing the absolute deadline, the revised wakeup rule
> adjusts the remaining runtime, reducing it to fit into the allowed
> density.
>
> A resumed version of the idea is:
>
> At a given time t, the maximum absolute density of a task cannot be
> higher than its relative density, that is:
>
> runtime / (deadline - t) <= dl_runtime / dl_deadline
>
> Knowing the laxity of a task (deadline - t), it is possible to move
> it to the other side of the equality, thus enabling to define max
> remaining runtime a task can use within the absolute deadline, without
> over-running the allowed density:
>
> runtime = (dl_runtime / dl_deadline) * (deadline - t)
>
> For instance, in our previous example, the task could still run:
>
> runtime = ( 5 / 7 ) * 4
> runtime = 2.85 ms
>
> Without causing damage for other deadline tasks. It is note worth that
> the laxity cannot be negative because that would cause a negative
> runtime. Thus, this patch depends on the patch:
>
> edf5835 sched/deadline: Throttle a constrained deadline task activated
> after the deadline

My git tree says that is:

df8eac8cafce ("sched/deadline: Throttle a constrained deadline task activated after the deadline")

> Which throttles a constrained deadline task activated after the
> deadline.
>
> Finally, it is also possible to use the revised wakeup rule for
> all other tasks, but that would require some more discussions
> about pros and cons.
>
> Reported-by: Xunlei Pang <xpang@xxxxxxxxxx>
> Signed-off-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> Cc: Xunlei Pang <xpang@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx>
> Cc: Tommaso Cucinotta <tommaso.cucinotta@xxxxxxxx>
> Cc: Romulo Silva de Oliveira <romulo.deoliveira@xxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
> kernel/sched/deadline.c | 67 +++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 60 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a2ce590..71e5bcf 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -484,13 +484,63 @@ static bool dl_entity_overflow(struct sched_dl_entity *dl_se,
> }
>
> /*
> + * Revised wakeup rule [1]: For self-suspending tasks, rather then
> + * re-initializing task's runtime and deadline, the revised wakeup
> + * rule adjusts the task's runtime to avoid the task to overrun its
> + * density.
> + *
> + * Reasoning: a task may overrun the density if:
> + * runtime / (deadline - t) > dl_runtime / dl_deadline

When reading that, I have the instant question: "why / how ?" I suspect
the blurb below (at update_dl_entity) has the answer, if so this can use
a reference thereto.

> + *
> + * Therefore, runtime can be adjusted to:
> + * runtime = (dl_runtime / dl_deadline) * (deadline - t)
> + *
> + * In such way that runtime will be equals to the maximum density
> + * the task can use without breaking any rule.
> + *
> + * [1] Luca Abeni, Giuseppe Lipari, and Juri Lelli. 2015. Constant
> + * bandwidth server revisited. SIGBED Rev. 11, 4 (January 2015), 19-24.
> + */
> +static void
> +update_dl_revised_wakeup(struct sched_dl_entity *dl_se, struct rq *rq)
> +{
> + u64 density = div64_u64(dl_se->dl_runtime << 20, dl_se->dl_deadline);
> + u64 laxity = dl_se->deadline - rq_clock(rq);
> +
> + BUG_ON(laxity < 0);

Compiler will make that go away, by virtue of laxity being unsigned.

> +
> + dl_se->runtime = density * laxity >> 20;
> +}
> +
/*
* XXX comment that explains what constrained is ? per the definition
* below that is any task where deadline != period?
*/
> +static inline bool dl_is_constrained(struct sched_dl_entity *dl_se)
> +{
> + return dl_se->dl_deadline < dl_se->dl_period;
> +}
> +
> +/*
> * When a -deadline entity is queued back on the runqueue, its runtime and
> * deadline might need updating.
> *
> + * Currently, we are using two different CBS rules, 1) the original CBS
> + * for implicit deadline tasks; 2) the revisited CBS for constrained
> + * deadline ones. The reason is that the original CBS can cause a constrained
> + * deadline task to be replenished deadline/period times in a period, in
> + * the worst case, hence allowing it to run more than runtime/period.
> + * In order to prevent this misbehave, the revisited CBS is used for
> + * constrained deadline tasks. In the revisited CBS, in the case of an
> + * overload, rather than replenishing & postponing the deadline, the
> + * remaining runtime of a task is reduced to avoid runtime overflow.

We use two difference CBS rules:

1) the original CBS rule for implicit deadline tasks;
2) the revised CBS rule for constrained deadline tasks.

( and here I have the question, wth is implicit / constrained )

The reason is that the original CBS rul can cause a constrained deadline
task to be replenished deadline/period times in a period (worst case),
hence allowing it to run more than runtime/period.

( the deadline/period thing could use a few words extra; in the example
above that divides to: 7/1000 = .007 which does not have an integer
component. You cannot replenish fractional times etc.. )

In order to prevent this, the revised CBS rule reduces the remaining
runtime (by limiting the max density).

> + *
> + * So, for implicit deadline tasks, the policy here is that the runtime &
> + * deadline of an entity are update if and only if., either:

updated ? IFF

> + * - the current deadline is in the past, or
> * - using the remaining runtime with the current deadline would make
> * the entity exceed its bandwidth.
> + *
> + * For constrained deadline tasks, the policy here is that the runtime
> + * is reduced to avoid exceeding its bandwidth if:

> + * - using the remaining runtime with the current deadline would make
> + * the entity exceed its bandwidth.
> */

In general the comment on comments is to add whitespace. That greatly
increases readability.

> static void update_dl_entity(struct sched_dl_entity *dl_se,
> struct sched_dl_entity *pi_se)
> @@ -500,6 +550,14 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
>
> if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
> dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
> +
> + if (unlikely(dl_is_constrained(dl_se) &&
> + !dl_time_before(dl_se->deadline, rq_clock(rq)) &&

alignment is misleading, this is still inside the unlikely(.

> + !dl_se->dl_boosted)){
> + update_dl_revised_wakeup(dl_se, rq);
> + return;
> + }
> +
> dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
> dl_se->runtime = pi_se->dl_runtime;
> }