Re: [PATCH v1] Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy"

From: Lukasz Luba
Date: Wed Nov 30 2022 - 10:01:02 EST




On 11/30/22 14:29, Vincent Guittot wrote:
On Wed, 30 Nov 2022 at 15:04, Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

Hi Vincent,

On 11/30/22 10:42, Vincent Guittot wrote:
Hi All

Just for the log and because it took me a while to figure out the root
cause of the problem: This patch also creates a regression for
snapdragon845 based systems and probably on any QC chipsets that use a
LUT to update the OPP table at boot. The behavior is the same as
described by Sam with a staled value in sugov_policy.max field.

Thanks for sharing this info and apologies that you spent cycles
on it.

I have checked that whole setup code (capacity + cpufreq policy and
governor). It looks like to have a proper capacity of CPUs, we need
to wait till the last policy is created. It's due to the arch_topology.c
mechanism which is only triggered after all CPUs' got the policy.
Unfortunately, this leads to a chicken & egg situation for this
schedutil setup of max capacity.

I have experimented with this code, which triggers an update in
the schedutil, when all CPUs got the policy and sugov gov
(with trace_printk() to mach the output below)

Your proposal below looks similar to what is done in arch_topology.c.

Yes, even the name 'cpus_to_visit' looks similar ;)

arch_topology.c triggers a rebuild of sched_domain and removes its
cpufreq notifier cb once it has visited all CPUs, could it also
trigger an update of CPU's policy with cpufreq_update_policy() ?

At this point you will be sure that the normalization has happened and
the max capacity will not change.

True, they are done under that blocking notification chain, for the
last policy init. This is before the last time we call the
schedutil sugov_start with that last policy. That's why this code
is able to see that properly normalized max capacity under the:
trace_printk("schedutil the visit cpu mask is empty now\n");



I don't know if it's a global problem or only for systems using arch_topology


It would only be for those with arch_topology, so only our asymmetric
systems AFAICS.