Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

From: K Prateek Nayak
Date: Mon Jul 10 2023 - 22:52:37 EST


Hello Chenyu,

Thank you for taking a look at the results.

On 7/10/2023 9:07 PM, Chen Yu wrote:
> Hi Prateek,
>
> thanks for testing this patch,
> On 2023-07-10 at 16:36:47 +0530, K Prateek Nayak wrote:
>> Hello Chenyu,
>>
>> Thank you for sharing this extended version. Sharing the results from
>> testing below.
>>
>> tl;dr
>>
>> - tbench, netperf and unixbench-spawn see an improvement with ILB_UTIL.
>>
>> - schbench (old) sees a regression in tail latency once system is heavily
>> loaded. DeathStarBench and SPECjbb too see a small drop under those
>> conditions.
>>
>> - Rest of the benchmark results do not vary much.
>>
>>
> [...]
>> I have a couple of theories:
>>
> Thanks for the insights, I agree the risk you mentioned below could impact the
> performance. Some thoughts below:
>> o Either new_idle_balance is failing to find an overloaded busy rq as a
>> result of the limit.
>>
> If the system is overloaded, the ilb_util finds a relatively busy rq and pulls from it.
> There could be no much difference between a relatively busy rq and the busiest one,
> because all rqs are quite busy I suppose?
>> o Or, there is a chain reaction where pulling from a loaded rq which is not
>> the most loaded, will lead to more new_idle_balancing attempts which is
>> degrading performance.
> Yeah, it could be possible that the ilb_util finds a relatively busy rq, but the
> imbalance is not high so ilb decides not pull from it. However the busiest
> rq is still waiting for someone to help, and this could trigger idle load
> balance more frequently.

Yes, I was thinking along the similar lines.

>>
>> I'll go back and get some data to narrow down the cause. Meanwhile if
>> there is any specific benchmark you would like me to run on the test
>> system, please do let me know.
>>
> Another hints might be that, as Gautham and Peter suggested, we should apply ILB_UTIL
> to non-Numa domains. In above patch all the domains has sd_share which could
> bring negative impact when accessing/writing cross-node data.
> Sorry I did not post the latest version with Numa domain excluded previously as
> I was trying to create a protype to further speed up idle load balance by
> reusing the statistic suggested by Peter. I will send them out once it is stable.

Sure. I'll keep an eye out for the next version :)

>
> [..snip..]
>

--
Thanks and Regards,
Prateek