Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

From: Vineeth Remanan Pillai
Date: Tue May 16 2023 - 22:17:28 EST


Hi Luca,

On Tue, May 16, 2023 at 12:19 PM luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
>
> > I was thinking it should probably
> > be okay for tasks to reclaim differently based on what free bw is
> > left on the cpu it is running. For eg: if cpu 1 has two tasks of bw
> > .3 each, each task can reclaim "(.95 - .6) / 2" and another cpu with
> > only one task(.3 bandwidth) reclaims (.95 - .3). So both cpus
> > utilization is .95 and tasks reclaim what is available on the cpu.
>
> I suspect (but I am not sure) this only works if tasks do not migrate.
>
>From what I am seeing, if the reserved bandwidth of all tasks on a cpu
is less than Umax, then this works. Even with migration, if the task
lands on another cpu where the new running_bw < Umax, then it runs and
reclaims the free bandwidth. But this breaks if running_bw > Umax and
it can happen if total_bw is within limits, but a cpu is overloaded.
For eg: four tasks with reservation (7, 10) on a three cpu system.
Here two cpus will have running_bw = .7 but third cpu will be 1.4
even though total_bw = 2.80 which is less than the limit of 2.85.

>
> > With "1 - Uinact", where Uinact accounts for a portion of global free
> > bandwidth, tasks reclaim proportionately to the global free bandwidth
> > and this causes tasks with lesser bandwidth to reclaim lesser when
> > compared to higher bandwidth tasks even if they don't share the cpu.
> > This is what I was seeing in practice.
>
> Just to be sure: is this with the "original" Uextra setting, or with
> your new "Uextra = Umax - this_bw" setting?
> (I am not sure, but I suspect that "1 - Uinact - Uextra" with your new
> definition of Uextra should work well...)
>
I am seeing this with original Uextra setting where the global bandwidth
is accounted. With "Uextra = Umax - this_bw", reclaiming seems to be
correct and I think it is because it considers local bandwidth only.

> > With dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt (1)
> > TID[636]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.08
> > TID[635]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.07
> > TID[637]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.06
> >
> > With dq = -(max{u_i, (1 - Uinact - Uextra)} / Umax) * dt (2)
> > TID[601]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
> > TID[600]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
> > TID[602]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
>
> Maybe I am missing something and I am misunderstanding the situation,
> but my impression was that this is the effect of setting
> Umax - \Sum(u_i / #cpus in the root domain)
> I was hoping that with your new Umax setting this problem could be
> fixed... I am going to double-check my reasoning.
>
Even with the Umax_reclaim changes, equation (1) is the one which
reclaims upto 95% when number of tasks is less than the number of
cpus. With more tasks than cpus, eq (1) still reclaims more than
eq (2) and cpu utilization caps at 95%. I also need to dig more to
understand the reason behind this.

Thanks for looking into this, I will also study more on this and
keep you posted..

Thanks,
Vineeth