Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

From: Peter Zijlstra
Date: Wed Jun 21 2023 - 07:19:12 EST


On Tue, Jun 13, 2023 at 12:18:57AM +0800, Chen Yu wrote:
> When CPU is about to enter idle, it invokes newidle_balance() to pull
> some tasks from other runqueues. Although there is per domain
> max_newidle_lb_cost to throttle the newidle_balance(), it would be
> good to further limit the scan based on overall system utilization.
> The reason is that there is no limitation for newidle_balance() to
> launch this balance simultaneously on multiple CPUs. Since each
> newidle_balance() has to traverse all the CPUs to calculate the
> statistics one by one, this total time cost on newidle_balance()
> could be O(n^2). This is not good for performance or power saving.

Another possible solution is to keep struct sg_lb_stats in
sd->child->shared (below the NUMA domains) and put a lock around it.

Then have update_sd_lb_stats() do something like:

struct sg_lb_stats *sgs = &sds->sgs;

if (raw_spin_trylock(&sds->sg_lock)) {
struct sg_lb_stats tmp;

... collect tmp

sds->sgs = tmp;
raw_spin_unlock(&sds->sg_lock);
}

... use sgs

Then you know you've always got a 'recent' copy but avoid the concurrent
updates.