Re: [RFC][PATCH 13/15] sched/fair: Implement latency-nice

From: Peter Zijlstra
Date: Thu Jun 08 2023 - 06:36:01 EST


On Tue, Jun 06, 2023 at 04:54:13PM +0200, Vincent Guittot wrote:
> On Wed, 31 May 2023 at 14:47, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > Implement latency-nice as a modulation of the EEVDF r_i parameter,
> > specifically apply the inverse sched_prio_to_weight[] relation on
> > base_slice.
> >
> > Given a base slice of 3 [ms], this gives a range of:
> >
> > latency-nice 19: 3*1024 / 15 ~= 204.8 [ms]
> > latency-nice -20: 3*1024 / 88761 ~= 0.034 [ms]
>
> I have reread the publication. I have question about
>
> Theorem 1: The lag of any active client k in a steady system is
> bounded as follows,
> -rmax < lagk (d) < max(rmax ; q);
>
> and
>
> Corollary 2: Consider a steady system and a client k such that no
> request of client k is larger than a
> time quantum. Then at any time t, the lag of client k is bounded as follows:
> -q < lagk (t) < q
>
> q being the time quanta a task can run
> and rmax the maximum slice of active task
>
> I wonder how it applies to us. What is our time quanta q ?

> I guess that it's the tick because it is assumed that the algorithm
> evaluates which task should run next for each q interval in order to
> fulfill the fairness IIUC.So I don't think that we can assume a q
> shorter than the tick (at least with current implementation) unless we
> trigger some additional interrupts

Indeed, TICK_NSEC is our unit of accounting (unless HRTICK, but I've not
looked at that, it might not DTRT -- also, I still need to rewrite that
whole thing to not be so damn expensive).

> Then asking for a request shorter than the tick also means that
> scheduler must enqueue a new request (on behalf of the task) during
> the tick and evaluate if the task is still the one to be scheduled
> now.

If there is no 'interrupt', we won't update time and the scheduler can't
do anything -- as you well know. The paper only requires (and we
slightly violate this) to push forward the deadline. See the comment
with update_deadline().

Much like pure EDF without a combined CBS.

> So similarly to q, the request size r should be at least a tick
> in order to reevaluate which task will run next after the end of a
> request. In fact, the real limit is : r/wi >= tick/(Sum wj)

> We can always not follow these assumptions made in the publication but
> I wonder how we can then rely on its theorems and corollaries

Again, I'm not entirely following, the corollaries take r_i < q into
account, that's where the max(rmax, q) term comes from.

You're right in that r_i < q does not behave 'right', but it doesn't
invalidate the results. Note that if a task overshoots, it will build of
significant negative lag (right side of the tree) and won't be eligible
for it's next actual period. This 'hole' in the schedule is then used to
make up for the extra time it used previously.

The much bigger problem with those bounds is this little caveat: 'in a
steady state'. They conveniently fail to consider the impact of
leave/join operations on the whole thing -- and that's a much more
interesting case.