Re: [RFC PATCH net-next v3 3/8] net: pcs: pcs-mtk-lynxi: add platform driver for MT7988

From: Russell King (Oracle)
Date: Wed Dec 13 2023 - 11:04:47 EST


On Tue, Dec 12, 2023 at 03:47:18AM +0000, Daniel Golle wrote:
> Introduce a proper platform MFD driver for the LynxI (H)SGMII PCS which
> is going to initially be used for the MT7988 SoC.
>
> Signed-off-by: Daniel Golle <daniel@xxxxxxxxxxxxxx>

I made some specific suggestions about what I wanted to see for
"getting" PCS in the previous review, and I'm disappointed that this
patch set is still inventing its own solution.

> +struct phylink_pcs *mtk_pcs_lynxi_get(struct device *dev, struct device_node *np)
> +{
> + struct platform_device *pdev;
> + struct mtk_pcs_lynxi *mpcs;
> +
> + if (!np)
> + return NULL;
> +
> + if (!of_device_is_available(np))
> + return ERR_PTR(-ENODEV);
> +
> + if (!of_match_node(mtk_pcs_lynxi_of_match, np))
> + return ERR_PTR(-EINVAL);
> +
> + pdev = of_find_device_by_node(np);
> + if (!pdev || !platform_get_drvdata(pdev)) {

This is racy - as I thought I described before, userspace can unbind
the device in one thread, while another thread is calling this
function. With just the right timing, this check succeeds, but...

> + if (pdev)
> + put_device(&pdev->dev);
> + return ERR_PTR(-EPROBE_DEFER);
> + }
> +
> + mpcs = platform_get_drvdata(pdev);

mpcs ends up being read as NULL here. Even if you did manage to get a
valid pointer, "mpcs" being devm-alloced could be freed from under
you at this point...

> + device_link_add(dev, mpcs->dev, DL_FLAG_AUTOREMOVE_CONSUMER);

resulting in this accessing memory which has been freed.

The solution would be either to suppress the bind/unbind attributes
(provided the underlying struct device can't go away, which probably
also means ensuring the same of the MDIO bus. Aternatively, adding
a lock around the remove path and around the checking of
platform_get_drvdata() down to adding the device link would probably
solve it.

However, I come back to my general point - this kind of stuff is
hairy. Do we want N different implementations of it in various drivers
with subtle bugs, or do we want _one_ implemenatation.

If we go with the one implemenation approach, then we need to think
about whether we should be using device links or not. The problem
could be for network interfaces where one struct device is
associated with multiple network interfaces. Using device links has
the unfortunate side effect that if the PCS for one of those network
interfaces is removed, _all_ network interfaces disappear.

My original suggestion was to hook into phylink to cause that to
take the link down when an in-use PCS gets removed.

> +
> + return &mpcs->pcs;
> +}
> +EXPORT_SYMBOL(mtk_pcs_lynxi_get);
> +
> +void mtk_pcs_lynxi_put(struct phylink_pcs *pcs)
> +{
> + struct mtk_pcs_lynxi *cur, *mpcs = NULL;
> +
> + if (!pcs)
> + return;
> +
> + mutex_lock(&instance_mutex);
> + list_for_each_entry(cur, &mtk_pcs_lynxi_instances, node)
> + if (pcs == &cur->pcs) {
> + mpcs = cur;
> + break;
> + }
> + mutex_unlock(&instance_mutex);

I don't see what this loop gains us, other than checking that the "pcs"
is still on the list and hasn't already been removed. If that is all
that this is about, then I would suggest:

bool found = false;

if (!pcs)
return;

mpcs = pcs_to_mtk_pcs_lynxi(pcs);
mutex_lock(&instance_mutex);
list_for_each_entry(cur, &mtk_pcs_lynxi_instances, node)
if (cur == mpcs) {
found = true;
break;
}
mutex_unlock(&instance_mutex);

if (WARN_ON(!found))
return;

which makes it more obvious why this exists.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!