Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks

From: Maxime Ripard
Date: Fri Jun 02 2023 - 03:32:09 EST


Hi,

On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> > [[PGP Signed Part:Undecided]]
> > Hi Frank,
> >
> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> >> I would like to bring your attention to the current process of setting
> >> the rate of an NKM clock. As it stands, when setting the rate of an
> >> NKM clock, the rate nearest but less than or equal to the requested
> >> rate is found, instead of the nearest rate.
> >
> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> > for example. Some devices require that we don't overshoot, while some
> > prefer to have the closest rate.
> >
> > Both are fine, and it's a bit context specific which one we should
> > favour. If we were to do anything, it would be to support both and let
> > the clock driver select which behaviour it wants.
> >
>
> Ok, understood. Thank you for the explanation! Now, I'm wondering if
> anyone would be using such a flag, if I added it.

I guess that's another thing :) If no-one is going to use it, why should
we do it in the first place?

But most likely the display and audio clocks are usually fairly ok with
overshooting a bit, and a closest rate is usually better.

> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> >> when setting a rate, each time iterating over all combinations of n,
> >> k, and m.
> >
> > Yeah, that's expected as well.
>
> I'm wondering though, if iterating over all combinations is set in
> stone, or if some kind of optimization would be in order.

The thing with optimization is that you need to optimize for something.
So you need to show that this code is suboptimal (by whatever metric you
want to optimize for), and that your code is more optimal that it used
to be.

It means identifying a problem, writing benchmarks, and showing that the
new code performs better there.

For example, your code might very well be faster, but it will increase
the kernel image (and thus the RAM usage). One is not more optimal than
the other in absolute, they both are, using a different metric.

And then you have the more social factors that play a huge part too:
readibility, maintainability, etc.

> >> In response to this, I propose the following refinements to optimize the NKM
> >> clock setting:
> >> a. when finding the best rate use the nearest rate, even if it is greater than
> >> the requested rate (PATCH 1)
> >> b. utilize binary search to find the best rate by going through a
> >> precalculated, ordered list of all meaningful combinations of n, k, and m
> >> (PATCH 2)
> >
> > One thing you haven't really addressed is why we would be doing this? Is
> > there some clocks that require a more precise clock and don't? Is the
> > factor calculation a bottleneck for some workloads?
>
> Background
> ==========
> I'm a pinephone user (ccu-sun50i-a64). I'm using U-Boot which sets the
> pll-video0 to 294 MHz on boot. The phone's panel requires DCLK to run at
> 108 MHz to get a nice 60 Hz vertical refresh rate. The clock structure
> is this:
>
> clock clock type
> --------------------------------------
> pll-video0 ccu_nm
> pll-mipi ccu_nkm
> tcon0 ccu_mux
> tcon-data-clock sun4i_dclk
>
> The divider between tcon0 and tcon-data-clock is fixed at 4. So, I need
> pll-mipi to run at 432 MHz to get the desired vertical refresh rate.
> When pll-vdeo0 is at 294 MHz this is that rate cannot be matched exactly
> with any combination. The best we can get is 431.2 MHz (n=11, k=2,
> m=15).
>
> The pinephone has some "vendor" patches (megi kernel) that
> a. add HDMI
> b. allow re-setting pll-mipi's rate when pll-video0 changes
>
> Re: Who needs a more precise clock?
> ===================================
> When plugging in HDMI, pll-video's rate is set to 297 MHz, which - in
> the vendor kernel, not mainline - triggers recalculation of pll-mipi
> (trying to set it to 431.2 MHz). It ends up with a rate of 424.285714
> MHz, because this is the nearest, but less than 431.2 MHz (n=5, k=2,
> m=7). The nearest rate would be 432 MHz.
>
> So, while analyzing the whole situation that I described above, I found
> out that the NKM clocks are not set to the closest rate and wondered why
> that is. Hence my request for comments.

It all makes sense, but I'm not sure why it requires a complete rewrite
of the factor calculation algo?

> Now, one could argue that pll-video0 should be set to 297MHz at boot

Not really no, we should strive to be as immune as possible from the
boot state.

> or that pll-mipi should try to set the *requested* rate instead of the
> previous rate when the pll-video0 changes.

It's not clear to me what is the distinction you make here between the
requested rate and the previous rate?

Do you mean that you have two clk_set_rate in sequence, with the first
one on pll-mipi (or one of its child clocks), and the second one on
pll-video0 (but on a different sub-tree than pll-mipi) and thus pll-mipi
has its rate changed by the second, but doesn't match the previous rate
enforced?

If so, that's kind of expected: clk_set_rate doesn't provide any
warranty on for how long the rate is going to stay there. There's two
way to prevent that. Either we call clk_set_rate_exclusive (on the
first) instead, but it will effectively lock a clock subtree and prevent
any rate change which is a pretty big constraint. Or you add a notifier
to adjust to the parent clock rate change and still provide the same
output rate, with a different set of parameters.

Maxime

Attachment: signature.asc
Description: PGP signature