Re: [GIT PULL] Scheduler changes for v6.8

From: Qais Yousef
Date: Sun Jan 14 2024 - 14:58:36 EST


On 01/14/24 16:20, Vincent Guittot wrote:
> On Sun, 14 Jan 2024 at 16:12, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > On 01/14/24 14:03, Vincent Guittot wrote:
> >
> > > Thanks for the trace. It was really helpful and I think that I got the
> > > root cause.
> > >
> > > The problem comes from get_capacity_ref_freq() which returns current
> > > freq when arch_scale_freq_invariant() is not enable, and the fact that
> > > we apply map_util_perf() earlier in the path now which is then capped
> > > by max capacity.
> > >
> > > Could you try the below ?
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index e420e2ee1a10..611c621543f4 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -133,7 +133,7 @@ unsigned long get_capacity_ref_freq(struct
> > > cpufreq_policy *policy)
> > > if (arch_scale_freq_invariant())
> > > return policy->cpuinfo.max_freq;
> > >
> > > - return policy->cur;
> > > + return policy->cur + policy->cur >> 2;
> > > }
> > >
> > > /**
> >
> > Is this a test patch or a proper fix? I can't see it being the latter. It seems
>
> It's a proper fix. It's the same mechanism that is used already :
> - Either you add margin on the utilization to go above current freq
> before it is fully used. This si what was done previously
> - or you add margin on the freq range to select a higher freq than
> current one before it become fully used

Aren't we applying the 25% headroom twice then?

>
> > the current logic fails when util is already 1024, and I think we're trying to
> > fix the invariance issue too late.
> >
> > Is the problem that we can't read policy->cur in the scheduler to fix the util
> > while it's being updated that's why it's done here in this case?
> >
> > If this is the problem, shouldn't the logic be if util is max then always go to
> > max frequency? I don't think we have enough info to correct the invariance here
> > IIUC. All we can see the system is saturated at this frequency and whether
> > a small jump or a big jump is required is hard to tell.
> >
> > Something like this
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 95c3c097083e..473d0352030b 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -164,8 +164,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> > struct cpufreq_policy *policy = sg_policy->policy;
> > unsigned int freq;
> >
> > - freq = get_capacity_ref_freq(policy);
> > - freq = map_util_freq(util, freq, max);
> > + if (util != max) {
> > + freq = get_capacity_ref_freq(policy);
> > + freq = map_util_freq(util, freq, max);
> > + } else {
> > + freq = policy->cpuinfo.max_freq;
> > + }
>
> This is not correct because you will have to wait to reach full
> utilization at the current OPP possibly the lowest OPP before moving
> directly to max OPP

Isn't this already the case? The ratio (util+headroom/max) will be less than
1 until util is 80% (with 25% headroom). And for all values <= 80% * max, we
will request a frequency smaller than/equal policy->cur, no?

ie:

util = 600
max = 1024

freq = 1.25 * 600 * policy->cur / 1024 = 0.73 * policy->cur

(util+headroom/max) must be greater than 1 for us to start going above
policy->cur - which seems to have been working by accident IIUC.

So yes my proposal is incorrect, but it seems the conversion is not right to me
now.

I could reproduce the problem now (thanks Wyes!). I have 3 freqs on my system

2.2GHz, 2.8GHz and 3.8GHz

which (I believe) translates into capacities

~592, ~754, 1024

which means we should pick 2.8GHz as soon as util * 1.25 > 592; which
translates into util = ~473.

But what I see is that we go to 2.8GHz when we jump from 650 to 680 (see
attached picture), which is what you'd expect since we apply two headrooms now,
which means the ratio (util+headroom/max) will be greater than 1 after go above
this value

1024 * 0.8 * 0.8 = ~655

So I think the math makes sense logically, but we're missing some other
correction factor.

When I re-enable CPPC I see for the same test that we go into 3.8GHz straight
away. My test is simple busyloop via

cat /dev/zero > /dev/null

I see the CPU util_avg is at 523 at fork. I expected us to run to 2.8GHz here
to be honest, but I am not sure if util_cfs_boost() and util_est() are maybe
causing us to be slightly above 523 and that's why we start with max freq.

Or I've done the math wrong :-) But the two don't behave the same for the same
kernel with and without CPPC.

>
> >
> > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > return sg_policy->next_freq;

Attachment: cppc_freq_fix.png
Description: PNG image