Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

From: luca abeni
Date: Fri May 19 2023 - 06:18:20 EST


On Fri, 19 May 2023 11:56:21 +0200
luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:

> Hi,
>
> sorry for returning on this discussion, but there is something I still
> do not understand:
>
> On Tue, 16 May 2023 11:08:18 -0400
> Vineeth Remanan Pillai <vineeth@xxxxxxxxxxxxxxx> wrote:
> [...]
> > I had tested this and it was reclaiming much less compared to the
> > first one. I had 3 tasks with reservation (3,100) and 3 cpus.
>
> So, just to confirm: here you have only 3 SCHED_DEADLINE tasks,
> scheduled on a root domain containing only 3 CPUs (dl_bw_cpus() return
> 3)... Right?
> So, the utilization of each task is 3/100 = 0.03 and Uextra is
> 1 - (0.03 * 3) / 3 = 0.97.

OK, sorry again... I found my error immediately after sending the email.
Uextra is computed as "Umax - ...", not "1 - ...".
So, I now understand where the 35% comes from.

I now _suspect_ the correct equation should be
dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt
but I want to test it before wasting your time again; I'll write more
after performing some more tests.


Luca

> And since all the tasks are always active, Uinact = 0...
> Is this understanding right?
>
> If so:
> > With dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt (1)
> > TID[636]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.08
> > TID[635]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.07
> > TID[637]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.06
> >
> > With dq = -(max{u_i, (1 - Uinact - Uextra)} / Umax) * dt (2)
> > TID[601]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
> > TID[600]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
> > TID[602]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
>
> Here, we should have
> dq = -(max{0.03, (1 - 0 - 0.97)} / Umax) * dt
> = -(0.03 / Umax) * dt
> which reclaims up to Umax... So, the utilization should be 95%
> Since you measured 35.65%, it means that (1-Uextra) is much larger
> than 0.97... So, maybe you found some bug in the Uextra computation?
>
> Can you try printing the extra_bw value, to check what happened?
>
>
>
> Thanks,
> Luca
>
> >
> > As the task bandwidth goes higher, equation (2) reclaims more, but
> > equation (2) is a constant of 95% as long as number of tasks less
> > than cpus. If the number of tasks is more than cpus, eq (2) fares
> > better in reclaiming than eq (1)
> >
> > eq (1) with 5 tasks (3,100)
> > TID[627]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 28.64
> > TID[626]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 28.64
> > TID[629]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 28.62
> > TID[628]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 29.00
> > TID[630]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 28.99
> >
> > Here top shows 3 cpus in the range ~45 to 50% util
> >
> > eq (2) with 5 tasks (3,100)
> > TID[667]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 57.20
> > TID[670]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 57.79
> > TID[668]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 57.11
> > TID[666]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 56.34
> > TID[669]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 55.82
> >
> > And here top shows all 3 cpus with 95% util
> >
> > > I'll write more about this later... And thanks for coping with all
> > > my comments!
> > >
> > Thanks :-)
> >
> > Vineeth
>