Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

From: Chen Yu
Date: Fri Jun 23 2023 - 10:45:36 EST


On 2023-06-23 at 22:33:23 +0800, Chen Yu wrote:
> Hi Peter,
> On 2023-06-21 at 13:17:21 +0200, Peter Zijlstra wrote:
> > On Tue, Jun 13, 2023 at 12:18:57AM +0800, Chen Yu wrote:
> > > When CPU is about to enter idle, it invokes newidle_balance() to pull
> > > some tasks from other runqueues. Although there is per domain
> > > max_newidle_lb_cost to throttle the newidle_balance(), it would be
> > > good to further limit the scan based on overall system utilization.
> > > The reason is that there is no limitation for newidle_balance() to
> > > launch this balance simultaneously on multiple CPUs. Since each
> > > newidle_balance() has to traverse all the CPUs to calculate the
> > > statistics one by one, this total time cost on newidle_balance()
> > > could be O(n^2). This is not good for performance or power saving.
> >
> > Another possible solution is to keep struct sg_lb_stats in
> > sd->child->shared (below the NUMA domains) and put a lock around it.
> >
> > Then have update_sd_lb_stats() do something like:
> >
> > struct sg_lb_stats *sgs = &sds->sgs;
> >
> > if (raw_spin_trylock(&sds->sg_lock)) {
> > struct sg_lb_stats tmp;
> >
> > ... collect tmp
> >
> > sds->sgs = tmp;
> > raw_spin_unlock(&sds->sg_lock);
> > }
> >
> > ... use sgs
> >
> > Then you know you've always got a 'recent' copy but avoid the concurrent
> > updates.
> Thanks for taking a look and gave the suggestions! Yes, this is a good idea, by
> doing this we can further limit the number of CPU to do update in parallel, and
> allow the newidle CPU to reuse the data for idle load balance from others.
> This lock only allow 1 CPU in that domain to iterate the whole group, and the
> bottleneck might reply on how fast the CPU who grabs the lock can finish
> collecting the tmp sgs data. For MC domain, it would not take too much time, and for
> higher domains between MC and NUMA domain, it depends on how many CPUs there are in that
> domain.
I just realized that it's a trylock, so it should not block other CPUs who launch
the idle balance, but just to let 1 CPUs update the 'snapshot' at one time.
I'll do some tests.

thanks,
Chenyu
> I'll create one prototype based on your suggestion and measure the test data.
>
> thanks,
> Chenyu