Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

From: Gautham R. Shenoy
Date: Thu Jun 22 2023 - 02:01:28 EST


On Wed, Jun 21, 2023 at 03:29:27PM +0800, Chen Yu wrote:
> Hi Gautham,

[..snip..]

> >
> I have an idea to limit scan depth for newidle balance for big domains.
> These domains should have CPUs higher than/equals to LLC(MC domain).
> However it seems that in current kernel only domain with SD_SHARE_PKG_RESOURCES
> flag set will have the shared struct sched_domain_shared among the CPUs in this
> domain. And this is reasonable because the cost to access the struct sched_domain_shared
> is lower if the CPUs share cache.

Yes, this was a conscious choice. But now, we may need to explore
using it and consider the trade-off between cachelines bouncing across
LLCs while updating sd_shared on higher domains (I would still prefer
limiting it to below the NUMA domain!) vs having to recompute stuff at
the higher domains, if something has been recently computed for this
level by some other CPU in the domain.

> Since ILB_UTIL relies on the sched_domain_shared
> to get the scan depth, I removed the restriction of SD_SHARE_PKG_RESOURCES
> during sched_domain_shared assignment.

That's what Prateek was exploring as well. I wonder however, if we
should have sd_shared defined only for non-NUMA domains.

> If non-LLC domain's sched_domain_shared is only used for ILB_UTIL,
> the overhead should be not too high(only periodic load balance would
> write to sched_domain_shared).
> Here is a untest patch which shows what
> I'm thinking of, and I'll do some refinement based on this:

Thanks a lot for the patch. I will add it to our test queue.

--
Thanks and Regards
gautham.