Re: [PATCH v2 1/2] clk: composite: Fix handling of high clock rates

From: Stephen Boyd
Date: Mon Jun 12 2023 - 20:10:51 EST


Quoting Sebastian Reichel (2023-05-26 10:10:56)
> ULONG_MAX is used by a few drivers to figure out the highest available
> clock rate via clk_round_rate(clk, ULONG_MAX). Since abs() takes a
> signed value as input, the current logic effectively calculates with
> ULONG_MAX = -1, which results in the worst parent clock being chosen
> instead of the best one.
>
> For example on Rockchip RK3588 the eMMC driver tries to figure out
> the highest available clock rate. There are three parent clocks
> available resulting in the following rate diffs with the existing
> logic:
>
> GPLL: abs(18446744073709551615 - 1188000000) = 1188000001
> CPLL: abs(18446744073709551615 - 1500000000) = 1500000001
> XIN24M: abs(18446744073709551615 - 24000000) = 24000001

I had to read the abs() macro to understand this and also look at the
types for 'req->rate' and 'tmp_req.rate' (both are unsigned long) to
understand what's going on. I'm not sure I'd say that abs() takes the
input as a signed value. Instead, it converts the input to a signed type
to figure out if it should negate the value or not. The problem is the
subtraction result is larger than can fit in a signed long long on a
64-bit machine, so we can't use the macro at all if the type is unsigned
long and the sign bit is set.

>
> As a result the clock framework will promote a maximum supported
> clock rate of 24 MHz, even though 1.5GHz are possible. With the
> updated logic any casting between signed and unsigned is avoided
> and the numbers look like this instead:
>
> GPLL: 18446744073709551615 - 1188000000 = 18446744072521551615
> CPLL: 18446744073709551615 - 1500000000 = 18446744072209551615
> XIN24M: 18446744073709551615 - 24000000 = 18446744073685551615
>
> As a result the parent with the highest acceptable rate is chosen
> instead of the parent clock with the lowest one.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 49502408007b ("mmc: sdhci-of-dwcmshc: properly determine max clock on Rockchip")
> Tested-by: Christopher Obbard <chris.obbard@xxxxxxxxxxxxx>
> Signed-off-by: Sebastian Reichel <sebastian.reichel@xxxxxxxxxxxxx>
> ---
> drivers/clk/clk-composite.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
> index edfa94641bbf..66759fe28fad 100644
> --- a/drivers/clk/clk-composite.c
> +++ b/drivers/clk/clk-composite.c
> @@ -119,7 +119,10 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
> if (ret)
> continue;
>
> - rate_diff = abs(req->rate - tmp_req.rate);
> + if (req->rate >= tmp_req.rate)
> + rate_diff = req->rate - tmp_req.rate;
> + else
> + rate_diff = tmp_req.rate - req->rate;

This problem is widespread

$ git grep abs\(.*- -- drivers/clk/ | wc -l
52

so we may have a bigger problem here. Maybe some sort of coccinelle
script or smatch checker can be written to look for abs() usage with an
unsigned long type or a subtraction expression. This may also be worse
after converting drivers to clk_hw_forward_rate_request() and
clk_hw_init_rate_request() because those set the rate to ULONG_MAX.
+Maxime for that as an FYI.

Maybe we need an abs_diff() macro instead, that checks the type and on
CONFIG_64BIT uses a conditional like above, but if it is a smaller type
then it just uses abs() on the expression because it knows the
difference will fit in the signed type conversion. I see that such a
macro exists in some drm driver, so maybe it can be promoted to
linux/math.h and then every grep hit above can use this macro instead.
Care to take that on?

Either way, I've applied this to clk-fixes as it is a regression. I'm
just worried that this problem is more extensive.