Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server

From: Daniel Bristot de Oliveira
Date: Tue Nov 07 2023 - 12:38:38 EST


On 11/7/23 17:47, Steven Rostedt wrote:
> On Mon, 6 Nov 2023 16:37:32 -0500
> Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
>> Say CFS-server runtime is 0.3s and period is 1s.
>>
>> At 0.7s, 0-laxity timer fires. CFS runs for 0.29s, then sleeps for
>> 0.005s and wakes up at 0.295s (its remaining runtime is 0.01s at this
>> point which is < the "time till deadline" of 0.005s)
>>
>> Now the runtime of the CFS-server will be replenished to the full 0.3s
>> (due to CBS) and the deadline
>> pushed out.
>>
>> The end result is, the total runtime that the CFS-server actually gets
>> is 0.595s (though yes it did sleep for 5ms in between, still that's
>> tiny -- say if it briefly blocked on a kernel mutex). That's almost
>> double the allocated runtime.
>>
>> This is just theoretical and I have yet to see if it is actually an
>> issue in practice.
>
> Let me see if I understand what you are asking. By pushing the execution of
> the CFS-server to the end of its period, if it it was briefly blocked and
> was not able to consume all of its zerolax time, its bandwidth gets
> refreshed. Then it can run again, basically doubling its total time.
>
> But this is basically saying that it ran for its runtime at the start of
> one period and at the beginning of another, right?
>
> Is that an issue? The CFS-server is still just consuming it's time per
> period. That means that an RT tasks was starving the system that much to
> push it forward too much anyway. I wonder if we just document this
> behavior, if that would be enough?

The code is not doing what I intended because I thought it was doing overload
control on the replenishment, but it is not (my bad).

he is seeing this timeline:

- w=waiting
- r=running
- s=sleeping
- T=throttled
- 3/10 reservation (30%).

|wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww|rrrrrrrrrrrrrrrrrrrrrrrrrrr|s|rrrrrrrrr+rrrrrrrr+rrrrrrrrr|TTTTTTTTTT <CPU
|___________________________period 1_______________________________________________________________|________period 2_______________________ < internal-period
0---------1---------2---------3---------4---------5---------6--------7--------8---------9----------10.......11.......12.........13......... < Real-time

It is not actually that bad because the ~2x runtime is over 2 periods.

But it is not what I intended... I intended this:

|wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww|rrrrrrrrrrrrrrrrrrrrrrrrrrrrsr|TTTTTTTTTT[...]TTTTTTTTTTT|rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr|TTTTTTT
|___________________________period 1_________________________________|_________period 2________________________[...]___________|___period 3____________________|[.... internal-period
0---------1---------2---------3---------4---------5---------6--------7--------8---------9----------10.......11.[...]16.........17........18........19........20|[.... < Real-time
---------------------------------------------------------------------+---------------------------------------------------------|
| +new period
+30/30>30/100, thus new period.

At the replenishment time, if the runtime left/period left > dl_rutime/dl_period,
replenish with a new period to avoid adding to much pressure to CBS/EDF.

One might say: but then the task period is different... or out of sync...
but it is not a problem: look at the "real-time"... the task starts and
run at the "deadline - runtime...." emulating the "zerolax"
(note, I do not like the term zerolax here... but (thomas voice:) whatever :-)).

One could say: in presence of deadline, this timelime will be different...

But that is intentional, as we do not want the fair server to break DL. But more
than that, if one has DL tasks, FIFO latency "property" is broken, and they should
just disable the defer option....

that is what I mentioned at the log:

"If the fair server reaches the zerolax time without consuming
its runtime, the server will be boosted, following CBS rules
(thus without breaking SCHED_DEADLINE)."

by the rule I meant doing the overload check... I thought it was
there already... but it was not... there was no need for it.

I am working on it... it is a simple change (but I need to test).

-- Daniel