Re: [GIT PULL] Scheduler changes for v6.8

From: Wyes Karny
Date: Sun Jan 14 2024 - 13:11:26 EST


On Sun, Jan 14, 2024 at 02:03:14PM +0100, Vincent Guittot wrote:
> On Sun, 14 Jan 2024 at 13:38, Wyes Karny <wkarny@xxxxxxxxx> wrote:
> >
> > On Sun, Jan 14, 2024 at 12:18:06PM +0100, Vincent Guittot wrote:
> > > Hi Wyes,
> > >
> > > Le dimanche 14 janv. 2024 à 14:42:40 (+0530), Wyes Karny a écrit :
> > > > On Wed, Jan 10, 2024 at 02:57:14PM -0800, Linus Torvalds wrote:
> > > > > On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
> > > > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > It's one of these two:
> > > > > >
> > > > > > f12560779f9d sched/cpufreq: Rework iowait boost
> > > > > > 9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
> > > > > >
> > > > > > one more boot to go, then I'll try to revert whichever causes my
> > > > > > machine to perform horribly much worse.
> > > > >
> > > > > I guess it should come as no surprise that the result is
> > > > >
> > > > > 9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit
> > > > >
> > > > > but to revert cleanly I will have to revert all of
> > > > >
> > > > > b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> > > > > f12560779f9d ("sched/cpufreq: Rework iowait boost")
> > > > > 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> > > > > performance estimation")
> > > > >
> > > > > This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.
> > > > >
> > > > > I'll keep that revert in my private test-tree for now (so that I have
> > > > > a working machine again), but I'll move it to my main branch soon
> > > > > unless somebody has a quick fix for this problem.
> > > >
> > > > Hi Linus,
> > > >
> > > > I'm able to reproduce this issue with my AMD Ryzen 5600G system. But
> > > > only if I disable CPPC in BIOS and boot with acpi-cpufreq + schedutil.
> > > > (I believe for your case also CPPC is diabled as log "_CPC object is not
> > > > present" came). Enabling CPPC in BIOS issue not seen in my system. For
> > > > AMD acpi-cpufreq also uses _CPC object to determine the boost ratio.
> > > > When CPPC is disabled in BIOS something is going wrong and max
> > > > capacity is becoming zero.
> > > >
> > > > Hi Vincent, Qais,
> > > >
>
> ...
>
> > >
> > > There is something strange that I don't understand
> > >
> > > Could you trace on the return of sugov_get_util()
> > > the value of sg_cpu->util ?
> >
> > Yeah, correct something was wrong in the bpftrace readings, max_cap is
> > not zero in traces.
> >
> > git-5511 [001] d.h1. 427.159763: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> > git-5511 [001] d.h1. 427.163733: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> > git-5511 [001] d.h1. 427.163735: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> > git-5511 [001] d.h1. 427.167706: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> > git-5511 [001] d.h1. 427.167708: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> > git-5511 [001] d.h1. 427.171678: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> > git-5511 [001] d.h1. 427.171679: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> > git-5511 [001] d.h1. 427.175653: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> > git-5511 [001] d.h1. 427.175655: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> > git-5511 [001] d.s1. 427.175665: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
> > git-5511 [001] d.s1. 427.175665: get_next_freq.constprop.0: [DEBUG] : freq 1400000, util 1024, max 1024
> >
> > Debug patch applied:
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 95c3c097083e..5c9b3e1de7a0 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -166,6 +166,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >
> > freq = get_capacity_ref_freq(policy);
> > freq = map_util_freq(util, freq, max);
> > + trace_printk("[DEBUG] : freq %llu, util %llu, max %llu\n", freq, util, max);
> >
> > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > return sg_policy->next_freq;
> > @@ -199,6 +200,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
> > util = max(util, boost);
> > sg_cpu->bw_min = min;
> > sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
> > + trace_printk("[DEBUG] : util %llu, sg_cpu->util %llu\n", util, sg_cpu->util);
> > }
> >
> > /**
> >
> >
> > So, I guess map_util_freq going wrong somewhere.
>
> Thanks for the trace. It was really helpful and I think that I got the
> root cause.
>
> The problem comes from get_capacity_ref_freq() which returns current
> freq when arch_scale_freq_invariant() is not enable, and the fact that
> we apply map_util_perf() earlier in the path now which is then capped
> by max capacity.
>
> Could you try the below ?
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e420e2ee1a10..611c621543f4 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -133,7 +133,7 @@ unsigned long get_capacity_ref_freq(struct
> cpufreq_policy *policy)
> if (arch_scale_freq_invariant())
> return policy->cpuinfo.max_freq;
>
> - return policy->cur;
> + return policy->cur + policy->cur >> 2;
> }
>
> /**

Issue seems to be fixed with this (but bit modified by me for arithmetic precedence):

patch:

@@ -133,7 +133,7 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
if (arch_scale_freq_invariant())
return policy->cpuinfo.max_freq;

- return policy->cur;
+ return policy->cur + (policy->cur >> 2);
}

/**

trace:
make-7912 [001] d..2. 182.070005: sugov_get_util: [DEBUG] : util 595, sg_cpu->util 743
make-7912 [001] d..2. 182.070006: get_next_freq.constprop.0: [DEBUG] : freq 3537231, util 743, max 1024
sh-7956 [001] d..2. 182.070494: sugov_get_util: [DEBUG] : util 835, sg_cpu->util 1024
sh-7956 [001] d..2. 182.070495: get_next_freq.constprop.0: [DEBUG] : freq 4875000, util 1024, max 1024
sh-7956 [001] d..2. 182.070576: sugov_get_util: [DEBUG] : util 955, sg_cpu->util 1024
sh-7956 [001] d..2. 182.070576: get_next_freq.constprop.0: [DEBUG] : freq 4875000, util 1024, max 1024
sh-7957 [001] d.h1. 182.072120: sugov_get_util: [DEBUG] : util 990, sg_cpu->util 1024
sh-7957 [001] d.h1. 182.072121: get_next_freq.constprop.0: [DEBUG] : freq 4875000, util 1024, max 1024
nm-7957 [001] dNh1. 182.076088: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024
nm-7957 [001] dNh1. 182.076089: get_next_freq.constprop.0: [DEBUG] : freq 4875000, util 1024, max 1024
grep-7958 [001] d..2. 182.076833: sugov_get_util: [DEBUG] : util 1024, sg_cpu->util 1024


Thanks,
Wyes