Re: [RFC][PATCH 1/5] sched/fair: Fix select_idle_cpu()s cost accounting

From: Mel Gorman
Date: Sat Jan 09 2021 - 09:01:04 EST


On Fri, Jan 08, 2021 at 09:21:48PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 08, 2021 at 10:27:38AM +0000, Mel Gorman wrote:
>
> > 1. avg_scan_cost is now based on the average scan cost of a rq but
> > avg_idle is still scaled to the domain size. This is a bit problematic
> > because it's comparing scan cost of a single rq with the estimated
> > average idle time of a domain. As a result, the scan depth can be much
> > larger than it was before the patch and led to some regressions.
>
> > @@ -6164,25 +6164,25 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> > */
> > avg_idle = this_rq()->avg_idle / 512;
> > avg_cost = this_sd->avg_scan_cost + 1;
> > -
> > - span_avg = sd->span_weight * avg_idle;
> > - if (span_avg > 4*avg_cost)
> > - nr = div_u64(span_avg, avg_cost);
> > - else
> > + nr = div_u64(avg_idle, avg_cost);
> > + if (nr < 4)
> > nr = 4;
>
> Oooh, could it be I simply didn't remember how that code was supposed to
> work and should kick my (much) younger self for not writing a comment?
>
> Consider:
>
> span_weight * avg_idle avg_cost
> nr = ---------------------- = avg_idle / ----------
> avg_cost span_weigt
>
> Where: avg_cost / span_weight ~= cost-per-rq
>

This would definitely make sense and I even evaluated it but the nature
of avg_idle and the scale it works at (up to 2*sched_migration_cost)
just ended up generating lunatic values far outside the size of the domain
size. Fitting that to the domain size just ended up looking silly too and
avg_cost does not decay. Still, in principle, it's the right direction,
it's just not what the code does right now.

--
Mel Gorman
SUSE Labs