Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

From: Chen Yu
Date: Mon Jul 10 2023 - 11:37:31 EST


Hi Prateek,

thanks for testing this patch,
On 2023-07-10 at 16:36:47 +0530, K Prateek Nayak wrote:
> Hello Chenyu,
>
> Thank you for sharing this extended version. Sharing the results from
> testing below.
>
> tl;dr
>
> - tbench, netperf and unixbench-spawn see an improvement with ILB_UTIL.
>
> - schbench (old) sees a regression in tail latency once system is heavily
> loaded. DeathStarBench and SPECjbb too see a small drop under those
> conditions.
>
> - Rest of the benchmark results do not vary much.
>
>
[...]
> I have a couple of theories:
>
Thanks for the insights, I agree the risk you mentioned below could impact the
performance. Some thoughts below:
> o Either new_idle_balance is failing to find an overloaded busy rq as a
> result of the limit.
>
If the system is overloaded, the ilb_util finds a relatively busy rq and pulls from it.
There could be no much difference between a relatively busy rq and the busiest one,
because all rqs are quite busy I suppose?
> o Or, there is a chain reaction where pulling from a loaded rq which is not
> the most loaded, will lead to more new_idle_balancing attempts which is
> degrading performance.
Yeah, it could be possible that the ilb_util finds a relatively busy rq, but the
imbalance is not high so ilb decides not pull from it. However the busiest
rq is still waiting for someone to help, and this could trigger idle load
balance more frequently.
>
> I'll go back and get some data to narrow down the cause. Meanwhile if
> there is any specific benchmark you would like me to run on the test
> system, please do let me know.
>
Another hints might be that, as Gautham and Peter suggested, we should apply ILB_UTIL
to non-Numa domains. In above patch all the domains has sd_share which could
bring negative impact when accessing/writing cross-node data.
Sorry I did not post the latest version with Numa domain excluded previously as
I was trying to create a protype to further speed up idle load balance by
reusing the statistic suggested by Peter. I will send them out once it is stable.

Thanks again,
Chenyu
> --
> Thanks and Regards,
> Prateek