Re: [PATCH v2 1/2] clk: composite: Fix handling of high clock rates

From: Maxime Ripard
Date: Tue Jun 13 2023 - 08:14:37 EST


On Mon, Jun 12, 2023 at 05:10:35PM -0700, Stephen Boyd wrote:
> Quoting Sebastian Reichel (2023-05-26 10:10:56)
> > ULONG_MAX is used by a few drivers to figure out the highest available
> > clock rate via clk_round_rate(clk, ULONG_MAX). Since abs() takes a
> > signed value as input, the current logic effectively calculates with
> > ULONG_MAX = -1, which results in the worst parent clock being chosen
> > instead of the best one.
> >
> > For example on Rockchip RK3588 the eMMC driver tries to figure out
> > the highest available clock rate. There are three parent clocks
> > available resulting in the following rate diffs with the existing
> > logic:
> >
> > GPLL: abs(18446744073709551615 - 1188000000) = 1188000001
> > CPLL: abs(18446744073709551615 - 1500000000) = 1500000001
> > XIN24M: abs(18446744073709551615 - 24000000) = 24000001
>
> I had to read the abs() macro to understand this and also look at the
> types for 'req->rate' and 'tmp_req.rate' (both are unsigned long) to
> understand what's going on. I'm not sure I'd say that abs() takes the
> input as a signed value. Instead, it converts the input to a signed type
> to figure out if it should negate the value or not. The problem is the
> subtraction result is larger than can fit in a signed long long on a
> 64-bit machine, so we can't use the macro at all if the type is unsigned
> long and the sign bit is set.
>
> >
> > As a result the clock framework will promote a maximum supported
> > clock rate of 24 MHz, even though 1.5GHz are possible. With the
> > updated logic any casting between signed and unsigned is avoided
> > and the numbers look like this instead:
> >
> > GPLL: 18446744073709551615 - 1188000000 = 18446744072521551615
> > CPLL: 18446744073709551615 - 1500000000 = 18446744072209551615
> > XIN24M: 18446744073709551615 - 24000000 = 18446744073685551615
> >
> > As a result the parent with the highest acceptable rate is chosen
> > instead of the parent clock with the lowest one.
> >
> > Cc: stable@xxxxxxxxxxxxxxx
> > Fixes: 49502408007b ("mmc: sdhci-of-dwcmshc: properly determine max clock on Rockchip")
> > Tested-by: Christopher Obbard <chris.obbard@xxxxxxxxxxxxx>
> > Signed-off-by: Sebastian Reichel <sebastian.reichel@xxxxxxxxxxxxx>
> > ---
> > drivers/clk/clk-composite.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
> > index edfa94641bbf..66759fe28fad 100644
> > --- a/drivers/clk/clk-composite.c
> > +++ b/drivers/clk/clk-composite.c
> > @@ -119,7 +119,10 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
> > if (ret)
> > continue;
> >
> > - rate_diff = abs(req->rate - tmp_req.rate);
> > + if (req->rate >= tmp_req.rate)
> > + rate_diff = req->rate - tmp_req.rate;
> > + else
> > + rate_diff = tmp_req.rate - req->rate;
>
> This problem is widespread
>
> $ git grep abs\(.*- -- drivers/clk/ | wc -l
> 52
>
> so we may have a bigger problem here. Maybe some sort of coccinelle
> script or smatch checker can be written to look for abs() usage with an
> unsigned long type or a subtraction expression. This may also be worse
> after converting drivers to clk_hw_forward_rate_request() and
> clk_hw_init_rate_request() because those set the rate to ULONG_MAX.
> +Maxime for that as an FYI.

We set max_rate to ULONG_MAX in those functions, and I don't think we
have a lot of driver that will call clk_round_rate on the max rate, so
we should be somewhat ok?

> Maybe we need an abs_diff() macro instead, that checks the type and on
> CONFIG_64BIT uses a conditional like above, but if it is a smaller type
> then it just uses abs() on the expression because it knows the
> difference will fit in the signed type conversion. I see that such a
> macro exists in some drm driver, so maybe it can be promoted to
> linux/math.h and then every grep hit above can use this macro instead.
> Care to take that on?
>
> Either way, I've applied this to clk-fixes as it is a regression. I'm
> just worried that this problem is more extensive.

Yeah, that construct is pretty much everywhere, including in the core :/

Maxime

Attachment: signature.asc
Description: PGP signature