Re: [RFC] cpufreq: send notifications for intermediate (stable) frequencies

From: Stephen Warren
Date: Thu May 15 2014 - 16:51:17 EST


On 05/15/2014 02:39 PM, Doug Anderson wrote:
> Hi,
>
> On Thu, May 15, 2014 at 12:17 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote:
>> On 05/14/2014 11:56 PM, Viresh Kumar wrote:
>>> Douglas Anderson, recently pointed out an interesting problem due to which his
>>> udelay() was expiring earlier than it should:
>>> https://lkml.org/lkml/2014/5/13/766
>>>
>>> While transitioning between frequencies few platforms may temporarily switch to
>>> a stable frequency, waiting for the main PLL to stabilize.
>>>
>>> For example: When we transition between very low frequencies on exynos, like
>>> between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
>>> No CPUFREQ notification is sent for that. That means there's a period of time
>>> when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
>>> and 300MHz. And so udelay behaves badly.
>>>
>>> To get this fixed in a generic way, lets introduce another callback safe_freq()
>>> for the cpufreq drivers.
>>>
>>> safe_freq() should return a stable intermediate frequency a platform might want
>>> to switch to, before jumping to the frequency corresponding to 'index'. Core
>>> will send the 'PRE' notification for this 'stable' frequency and 'POST' for the
>>> 'target' frequency. Though if ->target_index() fails, it will handle POST for
>>> 'stable' frequency only.
>>>
>>> Drivers must send 'POST' notification for 'stable' freq and 'PRE' for 'target'
>>> freq. If they can't switch to target frequency, they don't need to send any
>>> notification.
>>
>> This seems rather complex. Can't either the driver or the cpufreq core
>> be responsible for all of the notifications? Otherwise, the logic gets
>> rather complex, and spread between the core and the driver.
>>
>> Perhaps the core should make separate calls into the driver to switch to
>> the temporary frequency and the final frequency, so it can manage all
>> the notifications. Probably best to use a separate function pointer for
>> the temporary change so the driver can easily know what it's doing.
>
> In the discussion about the exynos cpufreq redesign (atop
> cpufreq-cpu0), it turns out that they've come up with a pretty
> reasonable solution that also happens to solve our problem. They
> utilize an extra divider to make sure that the temporary PLL gets
> divided down so that it's low enough.
>
> It might mean that going between 300 MHz and 500 MHz that you will
> transition through 400 MHz, but I'm quite OK with not sending out a
> notification for that.
>
> If something like that could work for tegra, then maybe we can drop
> this whole thing and it will all just fix itself. ;)

At least in the case of Tegra20 cpufreq, I don't think that will be
possible at least without changing the temporary clock source we use
(pll_p). The PLL that's use temporarily is also the root of all the
peripheral clocks, and hence can't be changed. We also only characterize
that PLL at the one specific frequency it was designed to run at.

That said, it looks like the CPU clock may support pll_p_out3 and 4 as
sources in addition to pll_p. I'm not sure if anything else uses those
divided pll_p outputs. Peter, perhaps you can comment? Also, since pll_p
itself runs at exactly 216MHz, pll_p_out3 and 4 can never go higher than
that, so we couldn't use this trick for transitions between two fast
clock rates. Note that Tegra20 does reparenting for any CPU clock rate
change, not just when changing to/from certain slow rates. None of the
other potential CPU clock parents seem any better.

It's possible that later Tegra SoCs have more freedom here, but I didn't
check.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/