Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks

From: Maxime Ripard
Date: Tue Jun 06 2023 - 10:03:44 EST


On Mon, Jun 05, 2023 at 12:31:41PM +0200, Frank Oltmanns wrote:
> Hi Maxime,
>
> On 2023-06-02 at 09:31:59 +0200, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> > [[PGP Signed Part:Undecided]]
> > Hi,
> >
> > On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
> >> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> >> > [[PGP Signed Part:Undecided]]
> >> > Hi Frank,
> >> >
> >> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> >> >> I would like to bring your attention to the current process of setting
> >> >> the rate of an NKM clock. As it stands, when setting the rate of an
> >> >> NKM clock, the rate nearest but less than or equal to the requested
> >> >> rate is found, instead of the nearest rate.
> >> >
> >> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> >> > for example. Some devices require that we don't overshoot, while some
> >> > prefer to have the closest rate.
> >> >
> >> > Both are fine, and it's a bit context specific which one we should
> >> > favour. If we were to do anything, it would be to support both and let
> >> > the clock driver select which behaviour it wants.
> >> >
> >>
> >> Ok, understood. Thank you for the explanation! Now, I'm wondering if
> >> anyone would be using such a flag, if I added it.
> >
> > I guess that's another thing :) If no-one is going to use it, why should
> > we do it in the first place?
> >
> > But most likely the display and audio clocks are usually fairly ok with
> > overshooting a bit, and a closest rate is usually better.
>
> Ok, I dived a bit deeper into this, but, as far as I can tell, the
> closest rate is not used anywhere in the sunxi-ng ccu driver. So, when
> extending, e.g., the NM or NKM clock to support, one must also extend at
> least the mux clocks, because they expect rates less than the requested
> rate. That seems to be quite the undertaking for only a small gain in
> precision.

mux clocks are using __clk_mux_determine_rate which should have the
behaviour you want when CLK_MUX_ROUND_CLOSEST is set.

> >> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> >> >> when setting a rate, each time iterating over all combinations of n,
> >> >> k, and m.
> >> >
> >> > Yeah, that's expected as well.
> >>
> >> I'm wondering though, if iterating over all combinations is set in
> >> stone, or if some kind of optimization would be in order.
> >
> > The thing with optimization is that you need to optimize for something.
> > So you need to show that this code is suboptimal (by whatever metric you
> > want to optimize for), and that your code is more optimal that it used
> > to be.
> >
> > It means identifying a problem, writing benchmarks, and showing that the
> > new code performs better there.
> >
> > For example, your code might very well be faster, but it will increase
> > the kernel image (and thus the RAM usage). One is not more optimal than
> > the other in absolute, they both are, using a different metric.
>
> Sure, I get that. I'll submit a patchset that adds the functionality to
> NKM clocks to set the rate of their parents.
>
> With the new patchset, the time for, e.g. setting DCLK increases from
> ~0.5 ms to a whopping 30 - 37 ms. Those times were taken
> unscientifically on my pinephone, i.e. kernel logging and a couple of
> re-boots. But I think that still might give an idea of why I thought
> about the need to increase performance.
>
> The reason for this massive increase is, that the patch iterates over
> all combinations of NKM for pll-mipi, and for each combination it
> iterates over all combinations of NM for pll-video0.
>
> Nevertheless, following your and Jernej's advice, I'll submit the
> patchset first and then we can discuss if speed optimizations are needed
> and what cost is acceptable.

Honestly, for 40ms, it will be a hard sell :)

> >> or that pll-mipi should try to set the *requested* rate instead of the
> >> previous rate when the pll-video0 changes.
> >
> > It's not clear to me what is the distinction you make here between the
> > requested rate and the previous rate?
>
> This is quite a de-tour from the original discussion, so I'm sorry for
> the confusion.
>
> By requested rate I mean the rate that the user (DCLK) requested. But
> this is not necessarily the rate that the clock is using in the end,
> because of its parent's rate.
>
> So, when the pll-video0 changes rate from 294 MHz to 297MHz (upon
> plugging in HDMI), pll-mipi does not know any longer what the requested
> rate (let's say 432MHz) was.

It does, it's struct clk_core's req_rate. It doesn't look like it's
available to clk_hw users, but given the rest of your explanation, I
guess you have a compelling use case to make it available.

Maxime

Attachment: signature.asc
Description: PGP signature