Re: [PATCH v5 7/7] sched/fair: Fair server interface

From: Joel Fernandes
Date: Thu Feb 15 2024 - 08:57:34 EST


Hello, Daniel,

On 2/14/2024 9:23 AM, Daniel Bristot de Oliveira wrote:
> On 2/13/24 03:13, Joel Fernandes wrote:
>>
>>
>> On 11/4/2023 6:59 AM, Daniel Bristot de Oliveira wrote:
>>> Add an interface for fair server setup on debugfs.
>>>
>>> Each rq have three files under /sys/kernel/debug/sched/rq/CPU{ID}:
>>>
>>> - fair_server_runtime: set runtime in ns
>>> - fair_server_period: set period in ns
>>> - fair_server_defer: on/off for the defer mechanism
>>
>> Btw Daniel, there is an interesting side-effect of this interface having runtime
>> and period in 2 separate files :)
>>
>> Say I want to set a CPU to 5ms / 10ms.
>>
>> I cannot set either period or runtime to 5ms or 10ms directly.
>>
>> I have to first set period to 100ms, then set runtime to 50ms, then set period
>> to 50ms, then set runtime to 5ms, then finally set period to 10ms.
>
> Hummm yeah I could reproduce that, it seems that it is not even a problem of having
> two files, but a bug in the logic, I will have a look.

Thanks for taking a look. My colleague Suleiman hit the issue too. He's able to
not set 45ms/50ms for instance.

Also just want to mention, if you could please CC my colleagues Suleiman and
Youssef on the patches who are also working on / reviewing these:

Suleiman Souhlal <suleiman@xxxxxxxxxx>
Youssef Esmat <youssefesmat@xxxxxxxxxx>

>> The reason seems to be because otherwise runtime / period will not be
>> accomodated and will cause dl_overflow issues.
>>
>> I'd suggest providing both runtime and period in the same interface to make it
>> more easier to use. However, for the testing I am going with what we have.
>>
>> Also a request:
>>
>> I was wondering if a new version of the last 3 patches could be posted to
>> LKML or shared in a tree somewhere. I am trying to sync to mainline and
>> rebase our latest fixes on top of that, however it is difficult to do because
>> these 3 patches are in bit of a flux (example the discussion between you and
>> Peter about update_curr()). What's the best way to move forward with rebasing
>> our fix contributions?
>
> Juri and I chat about, and we think it is a good thing to re-send this patch set,
> including a fix I have to it (to avoid regression wrt rt throttling), explaining
> these things in the mailing list so peter will be able to follow the discussion.
>
> I still need to finish testing, and to make a proper cover page with all updates, the
> latest thing is here (tm):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git/log/?h=dl_server_v6
>
> It is based on peter's sched/more. I will probably re-send it today or tomorrow,
> but at least you can have a look at it.
>> Another reason to send it is to get the regression test machinery running....

Sure, looking forward to it. I rebased on above tree and it applied cleanly.
What I'll do is I will send our patches today (not those in sched/more) after a
bit more testing and tweaks.

There are 2 reasons for this:
1. Can get the build robot do its thing.
2. Our internal system checks whether patches backported were posted upstream to
list.

Hope that sounds good to you and we can start reviewing as well.

> I am going with the sched/more in Peter's queue.git
>> unless you/Peter prefer something else. And I added your update_curr()
>> suggestion onto that, let me know if you disagree with it:
>>
>> @@ -1173,6 +1171,8 @@ static void update_curr(struct cfs_rq *cfs_rq)
>>
>> if (entity_is_task(curr))
>> update_curr_task(task_of(curr), delta_exec);
>> + else
>> + dl_server_update(&rq_of(cfs_rq)->fair_server, delta_exec);
>>
>> account_cfs_rq_runtime(cfs_rq, delta_exec);
>> }
>
> That part of the code was optimized by peter during the last round of discussions.
>
> It is like this now:
>
> ------------ %< -----------
> - if (entity_is_task(curr))
> - update_curr_task(task_of(curr), delta_exec);
> + if (entity_is_task(curr)) {
> + struct task_struct *p = task_of(curr);
> + update_curr_task(p, delta_exec);
> + /*
> + * Any fair task that runs outside of fair_server should
> + * account against fair_server such that it can account for
> + * this time and possibly avoid running this period.
> + */
> + if (p->dl_server != &rq->fair_server)
> + dl_server_update(&rq->fair_server, delta_exec);
> + }
> ------------ >% -----------
>
> It is not straightforward to understand... but the ideia is:
>
> if it is a task, and the server is ! of the fair server, discount time
> directly from the fair server. This also means that if dl_server is NULL
> (the server is not enabled) it will discount time from the fair server.

Yes, that makes sense. We certainly want to debit from the server even when DL
is not picking the task indirectly. I guess Peter's optimization also handles
the case where multiple servers are in play. That will help us when/if we make
RT as a server as well, right?

thanks,

- Joel