Re: [PATCH] cpufreq: vexpress-spc: Fix wrong alternation of policy->related_cpus during CPU hp

From: Sudeep Holla
Date: Wed Nov 27 2019 - 08:32:06 EST


On Wed, Nov 27, 2019 at 05:44:02PM +0530, Viresh Kumar wrote:
> On 27-11-19, 12:08, Sudeep Holla wrote:
> > On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote:
> > > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
> > > functions.") the core cpumask has to be modified during cpu hotplug
> > > operations.
> > >
> > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> > > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq
> > > driver.
> > >
> > > During CPU hp stress operations on multiple CPUs, policy->related_cpus
> > > can be altered. This is wrong since this cpumask should contain the
> > > online and offline CPUs.
> > >
> > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
> > > cpufreq_online() triggers in this case.
> > >
> > > The core cpumask can't be used to set the policy->cpus in
> > > ve_spc_cpufreq_init() anymore in case it is called via
> > > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
> > >
> > > An empty online() callback can be used to avoid that the init()
> > > driver function is called during CPU hotplug in so that
> > > policy->related_cpus will not be changed.
> > >
> >
> > Unlike DT based drivers, it not easy to get the fixed cpumask unless we
> > add some mechanism to extract it based on clks/OPP added. I prefer
> > this simple solution instead.
>
> I will call this a work-around for the problem and not really the
> solution, though I won't necessarily oppose it. There are cases which
> will break even with this solution.
>

I agree and that's the reason I spoke out my thought aloud here :)

> - Boot board with cpufreq driver as module.
> - Offline all CPUs except CPU0.
> - insert cpufreq driver.
> - online all CPUs.
>

Indeed, not just boot anytime since it's a module :)

> Now there is no guarantee that the last online will get the mask
> properly, if I have understood the problem well :)
>

Yes

> But yeah, who does this kind of messy work anyway :)
>

I won't bet on that ;)

> FWIW, we need a proper way (may be from architecture code) to find
> list of all CPUs that share clock line.
>

Yes but there's no architectural way. I need to revise and see tc2_pm.c
to check if we can do any magic there.

--
Regards,
Sudeep