Re: [PATCH v3 5/5] thermal: cpu_cooling: update the cpu device when cpufreq updates the policy cpu

From: Kapileshwar Singh
Date: Tue Mar 03 2015 - 06:41:47 EST




On 03/03/15 11:19, Viresh Kumar wrote:
> On 3 March 2015 at 16:29, Kapileshwar Singh <kapileshwar.singh@xxxxxxx> wrote:
>> We store the device pointer of the lead CPU (policy CPU) in a cpufreq domain as a part of the
>> cpufreq_cooling_device data structure. There is one cpufreq_cooling_device per
>> cpufreq domain.
>>
>> We need the device to find out the current OPP for the cpufreq_cooling_device for our static power calculation.
>>
>> opp = opp_find_freq_exact(cpu_dev, freq_hz, true);
>> voltage = dev_pm_opp_get_voltage(opp);
>>
>>
>> The problem we are trying to solve here is:
>>
>> When this lead CPU gets hotplugged out, the device pointer becomes stale and the policy
>> cpu for the cpufreq domain changes. We then store the new policy CPU's device pointer for the
>> in cpufreq_cooling_device on the reception of a notification from cpufreq. Being open to your
>> suggestions for any other possible ways to solve the problem..
>
> I would have loved that if life was that simple :)
>
> So, the OPP library today isn't that perfect and so is this doing rounds [1].
> The problem is the OPPs are initialized per device today and even if they
> are shared by multiple CPUs, OPP library doesn't know about it.
>
> So, if the policy->cpu goes away, OPP APIs on the new CPU will not work
> as OPPs are only initialized for one CPU and not for others within the same
> policy :)
>
> The way cpufreq-dt is taking care of this is by saving cpu_dev of the first
> CPU for which OPPs are initialized and always using that even if the CPU
> goes away. And you need to do exactly that.
>
> And please, do test such scenario before sending the patches again. As
> it would have simply failed in this case, have you given it a try ..

Yes I indeed tested the case where we cache the device pointer of the CPU for which the OPP's are populated.
When this CPU is hotplugged out, it invalidates the device pointer itself. Here are the error we get in dmesg:

..
<3>[67203.216774] opp_get_voltage: Invalid parameters
<3>[67203.326774] opp_get_voltage: Invalid parameters
<3>[67203.326774] opp_get_voltage: Invalid parameters
..

Which happens because:

unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
{
..
tmp_opp = rcu_dereference(opp);
if (unlikely(IS_ERR_OR_NULL(tmp_opp)) || !tmp_opp->available)
pr_err("%s: Invalid parameters\n", __func__);
else
..

Which happens when

opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
true);

returns a an erroneous or NULL OPP or the opp is unavailable (in the above condition)

Regards,
KP





>
> Once my patchset [1] is applied, life would be very simple and we can
> call OPP library for any CPU, but that is going to take some time.
>
> --
> viresh
>
> [1] https://www.marc.info/?l=linaro-kernel&m=142364262800650&w=3
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/