Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature

From: Daniel Bristot de Oliveira
Date: Mon Nov 07 2016 - 15:07:34 EST




On 11/07/2016 09:00 PM, Steven Rostedt wrote:
> On Mon, 7 Nov 2016 13:54:12 -0600 (CST)
> Christoph Lameter <cl@xxxxxxxxx> wrote:
>
>> On Mon, 7 Nov 2016, Steven Rostedt wrote:
>>
>>> On Mon, 7 Nov 2016 13:30:15 -0600 (CST)
>>> Christoph Lameter <cl@xxxxxxxxx> wrote:
>>>
>>>> SCHED_RR tasks alternately running on on cpu can cause endless deferral of
>>>> kworker threads. With the global effect of the OS processing reserved
>>>> it may be the case that the processor we are executing never gets any
>>>> time. And if that kworker threads role is releasing a mutex (like the
>>>> cgroup_lock) then deadlocks can result.
>>>
>>> I believe SCHED_RR tasks will still throttle if they use up too much of
>>> the CPU. But I still don't see how this patch helps your situation.
>>
>> The kworker thread will be able to make progress? Or am I not reading this
>> correctly?
>>
>
> If kworker is SCHED_OTHER, then it will be able to make progress if the
> RT tasks are throttled.
>
> What Daniel's patch does, is to turn off throttling of the RT tasks if
> there's no other task on the run queue.


Here in the example of two spinning RR tasks (f-22466 & f-22473) and an
other task (o-22506):
f-22466 [002] d... 79045.641364: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=o next_pid=22506 next_prio=120
o-22506 [002] d... 79045.690379: sched_switch: prev_comm=o prev_pid=22506 prev_prio=120 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79045.725359: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79045.825356: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79045.925350: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79046.025346: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79046.125346: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79046.225337: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79046.325333: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79046.425328: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79046.525324: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79046.625319: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79046.641320: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=o next_pid=22506 next_prio=120
o-22506 [002] d... 79046.690335: sched_switch: prev_comm=o prev_pid=22506 prev_prio=120 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94

The throttling is per-rq, so even if the RR tasks keep switching
between each other, the throttling will take place if there is
any sched other task.

On Christoph's case, the other task will be the kworker, like bellow:

f-22466 [002] d... 79294.430542: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79294.483539: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=kworker/2:1 next_pid=22198 next_prio=120
kworker/2:1-22198 [002] d... 79294.483544: sched_switch: prev_comm=kworker/2:1 prev_pid=22198 prev_prio=120 prev_state=S ==> next_comm=f next_pid=22473 next_prio=94
f-22473 [002] d... 79294.530537: sched_switch: prev_comm=f prev_pid=22473 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22466 next_prio=94
f-22466 [002] d... 79294.630541: sched_switch: prev_comm=f prev_pid=22466 prev_prio=94 prev_state=R ==> next_comm=f next_pid=22473 next_prio=94

The throttling allowed the kworker to run, but once the kworker went to
sleep, the RT tasks started to work again. In the previous behavior,
the system would either go idle, or the kworker would starve because
the runtime become infinity for RR tasks.

-- Daniel