Re: [PATCH] clk: allow to skip clk_core_req_round_rate_nolock()

From: Jan Dakinevich
Date: Sat Mar 09 2024 - 16:21:18 EST




On 2/29/24 05:16, Stephen Boyd wrote:
> Quoting Jan Dakinevich (2024-02-23 13:47:35)
>>
>>
>> On 2/23/24 02:20, Stephen Boyd wrote:
>>> Quoting Jan Dakinevich (2024-01-26 12:14:33)
>>>> Calling of clk_core_req_round_rate_nolock() can be time-consuming in a
>>>> case of deep hierarchy with multiple dividers/parents. But if the clock
>>>> already has exactly the same rate as desired, there is no need to
>>>> determine how it could be rounded.
>>>
>>> What exactly are you trying to avoid? Is this an optimization or a bug
>>> fix? TL;DR: I'm unlikely to apply this patch.
>>>
>>
>> It is an optimization, not a bug. The problem is that
>> clk_core_req_round_rate_nolock() is quite expensive, and I faced with
>> cases, where it takes tens and hundreds milliseconds (depending on SoC).
>>
>> As I see, it is irremovable feature of clk_core_req_round_rate_nolock()
>> design itself. Lets imagine, we have some clock, and its parent is a
>> divider. When clk_core_req_round_rate_nolock() is being called the
>> execution is walked through the following path:
>>
>> clk_core_determine_round_nolock
>> core->ops->determine_rate
>> divider_determine_rate
>> clk_divider_bestdiv
>>
>> Inside clk_divider_bestdiv() for each possible divider
>> clk_hw_round_rate() is called for parent of the clock, which in turn
>> calls clk_core_determine_round_nolock().
>>
>> So, each divider and multiplexer in clock path multiplies many times an
>> amount of iteration required to execute
>> clk_core_req_round_rate_nolock(). When there are a lot of them the time
>> consumed by clk_core_req_round_rate_nolock() becomes sufficient.
>
> Do you have a more concrete example? I wonder if perhaps you've split up
> the clk hardware into multipliers and dividers, when they really could
> all be combined into one clk that does all the math at once without
> traversing the tree. But if the problem is really just that the
> clk_divider_bestdiv() implementation is slow then that's good to know.
>

My experience related to audio stack on Amlogic SoCs. For example, below
is clock hierarchy on AXG SoC:

xtal

hifi_pll_dco

hifi_pll

aud_mst_c_mclk_sel

aud_mst_c_mclk_div

aud_mst_c_mclk

aud_mst_c_sclk_pre_en

aud_mst_c_sclk_div

aud_mst_c_sclk_post_en

aud_mst_c_lrclk_div

aud_mst_c_lrclk

aud_tdmout_c_lrclk

on A1 SoC (which is my target) it will be almost identical, but it is
not upstreamed yet:

xtal

hifipll_in

hifi_pll

audio_mst_a_mclk_mux

audio_mst_a_mclk_div

audio_mst_a_mclk

audio_mst_a_sclk_pre_en

audio_mst_a_sclk_div

audio_mst_a_sclk_post_en

audio_mst_a_lrclk_div

audio_mst_a_lrclk

audio_tdmout_a_lrclk

Clock setting operation that takes too long is here:

https://elixir.bootlin.com/linux/v6.7/source/sound/soc/meson/axg-tdm-interface.c#L279

In both cases there are three divider. When I artificially make that one
of these dividers has single value (using clk_div_table) it
significantly decreases the time that spent by clk_set_rate(): less then
1ms instead ~300ms on A1 SoC.


>>> I could see some driver implementing round_rate()/determine_rate() in a
>>> way that rounds the rate passed in, so that even if the rate is what the
>>> clk is running at _right now_, it still wants to change it to something
>>> else, or at least call down into the driver to call the set_rate clk_op.
>>> Applying this patch will break that. The contract is that
>>> clk_set_rate(rate) == clk_set_rate(clk_round_rate(rate)). It doesn't
>>> look like anything needs to change.
>>
>> If I am not mistaken, clocks's rate is either equal to its parent rate
>> or calculated by ->recalc_rate(). I suppose, this callback should return
>> valid rate value that is based on current clock parameters.
>>
>> Now, suppose the clock has rate "rateA" and we called clk_set_rate() to
>> set "rateA", but clk_core_req_round_rate_nolock() inside clk_set_rate()
>> rounds it to "rateB". Thus, although the clock is able to run on desired
>> rate (and actually run on it), ->determine_rate() and ->round_rate() are
>> unable to choose clocks's parameters for that value. Is it correct
>> behavior for clock driver?
>>
>
> It's not really a question for the clk framework. If the clk driver
> wants to round rateA to rateB then it can. It could be that the
> recalc_rate() clk_op calculates a slightly different rate than what
> round_rate() clk op did, because maybe the driver has frequency tables
> and the rate the clk runs at is something like 933333Hz but the driver
> just says that's 930000Hz for simplicity. If that happens, recalc_rate()
> gives us the "true" rate, while round_rate() gives us the "approximate"
> rate. Either way, the set_rate() clk_op knows that 930000Hz means set
> some clk rate, even if that doesn't match what recalc_rate() returns
> once the rate is changed.
>
> This is very much a real case, because this is essentially how the qcom
> clk driver works.

Ok. Got it.

--
Best regards
Jan Dakinevich