Re: [RFC PATCH V3 6/6] sched/fair: Implement starvation monitor

From: Daniel Bristot de Oliveira
Date: Mon Jun 12 2023 - 13:22:00 EST


On 6/12/23 03:57, Joel Fernandes wrote:
> Hello,
>
> On Thu, Jun 8, 2023 at 11:58 AM Daniel Bristot de Oliveira
> <bristot@xxxxxxxxxx> wrote:
>>
>> From: Juri Lelli <juri.lelli@xxxxxxxxxx>
>>
>> Starting deadline server for lower priority classes right away when
>> first task is enqueued might break guarantees, as tasks belonging to
>> intermediate priority classes could be uselessly preempted. E.g., a well
>> behaving (non hog) FIFO task can be preempted by NORMAL tasks even if
>> there are still CPU cycles available for NORMAL tasks to run, as they'll
>> be running inside the fair deadline server for some period of time.
>>
>> To prevent this issue, implement a starvation monitor mechanism that
>> starts the deadline server only if a (fair in this case) task hasn't
>> been scheduled for some interval of time after it has been enqueued.
>> Use pick/put functions to manage starvation monitor status.
>
> Me and Vineeth were discussing that another way of resolving this
> issue is to use a DL-server for RT as well, and then using a smaller
> deadline for RT. That way the RT is more likely to be selected due to
> its earlier deadline/period.

It would not be that different from what we have now.

One of the problems of throttling nowadays is that it accounts for a large window
of time, and any "imprecision" can cause the mechanism not to work as expected.

For example, we work on a fully-isolated CPU scenario, where some very sporadic
workload can be placed on the isolated CPU because of per-cpu kernel activities,
e.g., kworkers... We need to let them run, but for a minimal amount of time, for
instance, 20 us, to bound the interference.

The current mechanism does not give this precision because the IRQ accounting
does not account for runtime for the rt throttling (which makes sense). So the
RT throttling has the 20 us stolen from IRQs and keeps running. The same will
happen if we swap the current mechanism with a DL server for the RT.

Also, thinking about short deadlines to fake a fixed priority is... not starting
well. A fixed-priority higher instance is not a property of a deadline-based
scheduler, and Linux has a fixed-priority hierarchy: STOP -> DL -> RT -> CFS...
It is simple, and that is good.

That is why it is better to boost CFS instead of throttling RT. By boosting
CFS, you do not need a server for RT, and we account for anything on top of CFS
for free (IRQ/DL/FIFO...).

>
> Another approach could be to implement the 0-laxity scheduling as a
> general SCHED_DEADLINE feature, perhaps through a flag. And allow DL
> tasks to opt-in to 0-laxity scheduling unless there are idle cycles.
> And then opt-in the feature for the CFS deadline server task.

A 0-laxity scheduler is not as simple as it sounds, as the priority also depends
on the "C" (runtime, generally WCET), which is hard to find and embeds
pessimism. Also, having such a feature would make other mechanisms harder, as
well as debugging things. For example, proxy-execution or a more precise
schedulability test...

In a paper, the scheduler alone is the solution. In real life, the solution
for problems like locking is as fundamental as the scheduler. We need to keep
things simple to expand on these other topics as well.

So, I do not think we need all the drawbacks of a mixed solution to just fix
the throttling problem, and EDF is more capable and explored for the
general case.

With this patch's idea (and expansions), we can fix the throttling problem
without breaking other behaviors like scheduling order...

>
> Lastly, if the goal is to remove RT throttling code eventually, are
> you also planning to remove RT group scheduling as well? Are there
> users of RT group scheduling that might be impacted? On the other
> hand, RT throttling / group scheduling code can be left as it is
> (perhaps documenting it as deprecated) and the server stuff can be
> implemented via a CONFIG option.

I think that the idea is to have the DL servers eventually replace the group
schedule. But I also believe that it is better to start by solving the
throttling and then moving to other constructions on top of the mechanism.

-- Daniel
>
> - Joel
>
>> Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
>> Signed-off-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>